Recipe recommendation using ingredient networks
Chun-Yuen TengSchool of InformationUniversity of MichiganAnn Arbor, MI, USA
Yu-Ru LinIQSS, Harvard University
CCS, Northeastern UniversityBoston, MA
Lada A. AdamicSchool of InformationUniversity of MichiganAnn Arbor, MI, USA
ABSTRACTThe recording and sharing of cooking recipes, a human ac-tivity dating back thousands of years, naturally became anearly and prominent social use of the web. The resultingonline recipe collections are repositories of ingredient com-binations and cooking methods whose large-scale and vari-ety yield interesting insights about both the fundamentals ofcooking and user preferences. These insights include pref-erences for cooking methods depending on the nutritionalvalue extracted from food, and the geographic region fromwhich the recipe originates. At the level of an individual in-gredient we measure whether it tends to be essential or canbe dropped or added, and whether its quantity can be modi-fied. We also construct two types of networks to capture therelationships between ingredients. The complement networkcaptures which ingredients tend to co-occur frequently, andis composed of two large communities: one savory, the othersweet. The substitute network, derived from user generatedsuggestions for modifications, can be decomposed into manycommunities of functionally equivalent ingredients, and cap-tures users’ preference for healthier variants a recipe. Ourexperiments reveal that recipe ratings can be well predictedwith features derived from combinations of ingredient net-works and nutrition information.
Categories and Subject DescriptorsH.2.8 [Database Management]: Database applications—Data mining
General TermsMeasurement; Experimentation
Keywordsingredient networks, recipe recommendation
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
1. INTRODUCTIONThe web enables individuals to collaboratively share knowl-
edge and recipe websites are one of the earliest examples ofcollaborative knowledge sharing on the web. Allrecipes.com,the subject of our present study, was founded in 1997, yearsahead of other collaborative websites such as the Wikipedia.Recipe sites thrive because individuals are eager to sharetheir recipes, from family recipes that had been passed downfor generations, to new concoctions that they created thatafternoon, having been motivated in part by the ability toshare the result online. Once shared, the recipes are imple-mented and evaluated by other users, who supply ratingsand comments.
The desire to look up recipes online may at first appearodd given that tombs of printed recipes can be found in al-most every kitchen. The Joy of Cooking [12] alone contains4,500 recipes spread over 1,000 pages. There is, however,substantial additional value in online recipes, beyond be-ing able to look them up. While the Joy of Cooking con-tains a single recipe for Swedish meatballs, allrecipes.comhas “Swedish Meatballs I”, “II”, and “III”, submitted by dif-ferent users, along with 4 other variants, including “TheAmazing Swedish Meatball”. Each variant has been re-viewed, from 329 reviews for “Swedish Meatballs I” to 5reviews for “Swedish Meatballs III”. The reviews not onlyprovide a crowd-sourced ranking of the different recipes, butalso many suggestions on how to modify them, e.g. usingground turkey instead of beef, skipping the“cream of wheat”because it is rarely on hand, etc.
The wealth of information captured by online collabora-tive recipe sharing sites is revealing not only of the fun-damentals of cooking, but also user preferences. The co-occurrence of ingredients in tens of thousands of recipes pro-vides information about which ingredients go well together,and when a pairing is unusual. Users’ reviews provide cluesas to the flexibility of a recipe, and the ingredients withinit. Can the amount of cinnamon be doubled? Can the nut-meg be omitted? If one is lacking a certain ingredient, can asubstitute be found among supplies at hand without a tripto the grocery store? Unlike cookbooks, which will containvetted but perhaps not the best variants for some individu-als’ tastes, ratings assigned to user-submitted recipes allowfor the evaluation of what works and what does not.
In this paper, we seek to distill the collective knowlege andpreference about cooking through mining a popular recipe-sharing website. To extract such information, we first parsethe unstructured texts from the recipes and users’ reviews.We construct two types of networks that reflect different re-
lationships between ingredients, in order to capture users’knowledge about how to combine ingredients. The comple-ment network captures which ingredients tend to co-occurfrequently, and is composed of two large communities: onesavory, the other sweet. The substitute network, derivedfrom user generated suggestions for modifications, can bedecomposed into many communities of functionally equiva-lent ingredients, and captures users’ preference for healthiervariants a recipe. Our experiments reveal that recipe ratingscan be well predicted by features derived from combinationsof ingredient networks and nutrition information (with ac-curacy .792), while most of the prediction power comes fromthe ingredient networks (84%).
The rest of the paper is organized as follows. Section 2 re-views the related work. Section 3 describes the dataset andsection 4 provides descriptive analysis of the data. Section 5discusses the extraction of ingredient complement networkand the characteristics of this network. Section 6 presentsthe extraction of recipe modification information, as well asthe construction and characteristics of the ingredient substi-tute network. Section 7 presents our experiments on reciperecommendation and section 8 concludes.
2. RELATED WORKRecipe recommendation has been the subject of much
prior work. Typically the goal has been to suggest recipes tousers based on their past recipe ratings [5][15][3] or brows-ing/cooking history [16]. The algorithms then find simi-lar recipes based on overlapping ingredients, either treatingeach ingredient equally [4] or according to the focus of therecipe [20]. For example, for “grilled chicken with basildressing”, chicken is assigned a higher weight than basil. In-stead of modeling recipes using ingredients, Wang et al. [17]represent the recipes as graphs which are built on ingre-dients and cooking directions, and they demonstrate thatgraph representations can be used to easily aggregate Chi-nese dishes by the flow of cooking steps and the sequence ofadded ingredients. However, their approach only models theoccurrence of ingredients or cooking methods, and doesn’ttake into account the relationships between ingredients. Incontrast, in this paper we incorporate the likelihood of ingre-dients to co-occur, as well as the potential of one ingredientto act as a substitute for another.
Another branch of research has focused on recommend-ing recipes based on desired nutritional intake or promotinghealthy food choices. Geleijnse et al. [8] designed a proto-type of a personalized recipe advice system, which suggestsrecipes to users based on their past food selections and nutri-tion intake. In addition to nutrition information, Kamieth etal. [10] built a personalized recipe recommendation systembased on availability of ingredients and personal nutritionalneeds. Shidochi et al. [14] proposed an algorithm to extractreplaceable ingredients from recipes in order to satisfy users’various demands, such as calorie constraints and food avail-ability. Their method identifies substitutable ingredients bymatching the cooking actions that correspond to ingredientnames. However, their assumption that substitutable ingre-dients are subject to the same processing methods is less di-rect and specific than extracting substitutions directly fromuser-contributed suggestions. In the present paper, we usea data-driven approach to construct a substitute ingredi-ent network, and derive clusters of substitutable ingredientsfrom this network.
3. DATASETAllrecipes is one of the most popular recipe-sharing web-
sites, where novice and expert cooks alike can upload andrate cooking recipes. Since its launch in 1997, it has re-ceived more than 535 milliion annual visits from over 5 mil-lion members. It hosts 16 customized international sitesfor users to share their recipes in their native languages.Recipes uploaded to the site contain specific instructionson how to prepare a dish: the list of ingredients, prepara-tion steps, preparation and cook time, the number of serv-ings produced, nutrition information, serving directions, andphotos of the prepared dish. The uploaded recipes are en-riched with user ratings and reviews, which comment onthe quality of the recipe, and suggest changes and improve-ments. In addition to rating and commenting on recipes,users are able to save them as favorites or recommend themto others through a forum.
We downloaded 46,337 recipes including all informationlisted from allrecipes.com, including several classifications,such as a region (e.g. the midwest region of US or Eu-rope), the course or meal the dish is appropriate for (e.g.:appetizers or breakfast), and any holidays the dish may beassociated with. In order to understand users’ recipe prefer-ences, we crawled 1,976,920 reviews which include reviewers’ratings, review text, and the number of users who voted thereview as useful. We further downloaded information on the530,609 users who reviewed and rated recipes. This informa-tion includes links to users’ pages, their interests, the datethey joined the site, their cooking experience level, hometown, and current city they live in.
3.1 Data preprocessingThe first step in processing the recipes is identifying the
ingredients listed. Matching on predefined lists of ingre-dients often missed or misidentified ingredients commonlysupplied by users. We therefore derived the list of ingredi-ents from the recipes themselves through the following pro-cedure. We removed quantifiers, such as e.g. “1 lb” or “2cups”, words referring to consistency or temperature, e.g.chopped or cold, along with a few other heuristics, such asremoving content in parentheses. For example “1 (28 ounce)can baked beans (such as Bush’s Original R©)” is identified as“baked beans”. We erred on the side of not conflating poten-tially identical or highly similar ingredients, e.g. “cheddarcheese”, used in 2450 recipes, was considered different from“sharp cheddar cheese”, occurring in 394 recipes.
We then generated an ingredient list sorted by frequencyof ingredient occurrence and selected the top 1000 commoningredient names as our finalized ingredient list. Each of thetop 1000 ingredients occurred in 23 or more recipes, withplain salt making an appearance in 21,916 recipes. Theseingredients also accounted for 94.9% of ingredient entries inthe recipe dataset. The remaining ingredients were missedeither because of high specificity (e.g. yolk-free egg noodle),referencing brand names (e.g. Planters almonds), rarity (e.g.serviceberry), misspellings, or not being a food (e.g. “nylonnetting”).
The remaining processing task involved identifying cook-ing processes from the directions. We first identified all heat-ing methods using a listing in the Wikipedia entry on cook-ing [18]. For example, baking, boiling, and steaming areall ways of heating the food. We then identified mechanicalways of processing the food such as chopping and grinding,
and other chemical techniques such as marinating and brin-ing.
4. DESCRIPTIVE ANALYSISOne of the interesting aspects about this dataset is that
it allows us to obtain a large-scale view of cooking methods.Here we discuss how different cooking methods vary withuser and regional preference.
4.1 Why we cookIt has been suggested that cooking played a significant role
in human evolution by allowing us to extract more energyvalue from food [19]. An experiment measuring energy ex-pended by the Burmese python in digesting meat has shownthat processing the meat by grinding and cooking individ-ually reduces the digestive cost to the snake, and combin-ing both processing methods reduces energy cost more thaneach method individually [1]. Interestingly, it appears thataverage recipe ratings correlate with the ability of the pro-cessing method to reduce digestive cost. Table 4.1 showsthat recipes that call for cooking food have higher ratingsthan ones that merely break it down mechanically, which inturn are rated more highly than ones that simply “mix” or“toss” ingredients together. Furthermore, we observe thatrecipes with additional chemical processing, e.g. ferment-ing and marinating, tend to receive higher ratings than onespreparing the food with only heating and mechanical meth-ods. However, perhaps due to the additional time and plan-ning they require, they occur in only about 8% of the recipesin the dataset.
Table 1: Occurrence and average ratings of cookingmethods
occurrence average ratingMechanical methods 34759 3.60
Heating methods 40238 4.11Chemical methods 3686 4.14
The preference for multiple food processing methods mightat first be interpreted as a reflection of the sophisticationof the recipe, with more complex recipes rated more highly.However, in general we find no correlation between the num-ber of steps or the number of ingredients and the averagerating a recipe receives, making it more likely that the di-gestibility of the prepared food is a factor in how highlyrated it is.
4.2 Regional preferencesWhile cooking methods that make food more digestible
tend to be preferred, choosing one method over anotherappears to be a question of regional taste. About 5.8%(n=2693) of recipes were classified into one of 5 US regions:Midwest, Northeast, South, West Coast (including Alaskaand Hawaii), and Mountain. Figure 1 shows significantly(χ2 test p-value < 0.001) varying preferences in the differ-ent US regions among 6 of the most popular cooking meth-ods. Boiling and simmering, both involving heating food inhot liquids, are more common in the South and Midwest.Marinating and grilling are relatively more popular in theWest and Mountain regions, but in the West more grillingrecipes involve seafood (18/42 = 42%) relative to other re-
% in recipes
method
bake
fry
roast
grill
marinate
simmer
boil
0 10 20 30 40
west-coastsouth
northeastmountainmidwest
Figure 1: The percentage of recipes by region thatapply a specific cooking recipe.
gions combined (7/106 = 6%). Frying is popular in theSouth and Northeast. Baking is a universally popular andversatile technique, which is often used for both sweet andsavory dishes, and is slightly more popular in the Northeastand Midwest. Examination of individual recipes reflectingthese frequencies shows that these differences can be tiedto differences in demographics, immigrant culture and avail-ability of local ingredients, e.g. seafood.
5. INGREDIENT COMPLEMENT NETWORKCan we learn how to combine ingredients from the data?
Here we employ the occurrences of ingredients across recipesto learn users’ knowledge about combinations of ingredients.
We constructed an ingredient complement network basedon pointwise mutual information (PMI) defined on pairs ofingredients (a, b):
pmi(a; b) = logp(a, b)
p(a)p(b),
where
p(a, b) =# of recipes containing a and b
# of recipes,
p(a) =# of recipes containing a
# of recipes,
p(b) =# of recipes containing b
# of recipes.
The PMI gives the probability that two ingredients occurtogether against the probability that they occur separately.Complementary ingredients tend to occur together far moreoften than would be expected by chance.
Figure 2 shows a visualization of ingredient complemen-tarity. Two distinct subcommunities of recipes are imme-diately apparent: one corresponding to savory dishes, theother to sweet ones. Some central ingredients, e.g. egg andsalt, actually are pushed to the periphery of the network.They are so ubiquitous, that although they have many edges,they are all weak, since they don’t show particular comple-mentarity with any single group of ingredients.
We further probed the structure of the complementaritynetwork by applying a network clustering algorithm. Thealgorithm confirmed the existence of two main clusters con-taining the vast majority of the ingredients. An interestingsatellite cluster is that of ingredients for mixed drinks, which
cherry gelatin
graham cracker
low fat cottage cheese
pork shoulder roast
heavy whipping cream tofu
bok choy
butter cracker
baking soda
pimento pepper
milk powder
chorizo sausage
lady�nger
steak sauce
crimushroom
radishe
shiitake mushroom
pesto
brownie mix
pumpkin pie spice
rye �our
cardamom
sa�ron thread
linguine
corn
fat free sour cream
basmati rice
bittersweet chocolate
bay
corn chip
cracker
french green bean
poppy seed
vegetable oil
grape tomato
pizza crust dough
low sodium beef broth
club soda
lard
soy saucepanko bread
couscou
crab meat
mango
unpastry shell
catalina dressing
pasta shell
italian salad dressing
mexican corn
decorating gel
italian bread
napa cabbage
onion powder
white wine vinegar
cocktail rye bread
basil sauce
crouton
brown gravy mix
barbeque sauce
apple cider vinegar
hoagie roll
milk chocolate candy kisse
�ounder
salt black pepper
maraschino cherry juice
chow mein noodle
tiger prawn
banana pepper
cranberry
vermicelli pasta
root beer
strawberry jam
lemon gelatin mix
creamed corn
pretzel
pie shell
sun�ower kernel
rump roast
romaine
vegetable stock
lemon pepper seasoning
guacamole
louisiana hot sauce
cabbage
yellow onion
super�ne sugar
orange peel
raspberry
cumin seedcandied mixed fruit peel
cream of coconut
bow tie pasta
creme fraiche
currant
pork chop
turkey gravy
fat free half and half
chicken ramen noodle
wooden skewer
whipping cream
mace
seasoning salt
mozzarella cheesepasta sauce
lean pork
broccoli �oweret
tomatillo
lemonade
tomato paste
caesar dressing
basil pesto
melon liqueur
coconut milk
whole wheat pastry �our
muenster cheese
lump crab meat
angel food cake
ring
cheese tortellini
spiral pasta
vanilla pudding
cauli�oweret
smoked sausage
hot dog
pita bread
cocoa powder
garbanzo bean
tart apple
wheat bran
hot pepper sauce
chili
refried bean
salmon steak
white cheddar cheese
low fat mayonnaise
grapefruit
dijon mustard
tomato juice
yellow squash
baking apple
cream of tartar
vodka
rye bread
white chip
�at iron steak
linguine pasta
fennel
whole wheat bread
baking mix
alfredo pasta sauce
margarine
confectioners' sugarfruit gelatin mix
pork
balsamic vinegar
pork loin chop
jicama
pre pizza crust
triple sec
teriyaki sauce
cola carbonated beverage
polish sausage
cracked black pepper
poblano chile pepper
individually wrapped caramel
roast beef
bread stu�ng mix
eggnog
pear
caramel
beet
worcestershire sauce
chicken stock
horseradish
semisweet chocolate chip
basil
red grape
plum
cinnamon sugar
fajita seasoning
rice noodle
powdered milk
star anise pod
short grain rice
ramen noodle
vegetable
coconut oil
whiskey
lime gelatin mix
peanut oil
ham
ginger root
lima bean
pimento stu�ed green olive
hoisin sauce
round steak
stu�ng
part skim ricotta cheese
broiler fryer chicken up
milk chocolate chip
turbinado sugar
vegetable shortening
tarragon vinegar
golden delicious apple
turkey
rigatoni pasta
stu�ng mix
milk
juiced
burgundy wine
red kidney bean
dill
candied pineapple
german chocolate cake mix
arborio rice
sugar free vanilla pudding mix
pine nut
green apple
cucumber oreganopearl onion
stu�ed green olive
whipped topping mix
broccoli
pinto bean
pasta
beef short rib
gelatin
garlic powder
rutabaga
chicken liver
pepperjack cheese
herb
lemon gras
sweet potato
pineapple ring
parsley �ake
pie �lling
spice cake mix
butterscotch chip
greek yogurt
vanilla ice cream
seafood seasoning
parsnip
applesauce
chinese �ve spice powder
salt pepper
beef broth
cherry tomato
sage
vanilla
vital wheat gluten
artichoke heart
mixed berry
bacon dripping
self rising �our
nilla wafer
navy bean
bacon
egg yolk
wonton wrapper
chocolate pudding mix
salsa
coconut
tomato based chili sauce
marsala wine
mussel
manicotti shell
anise extract
mustard seed
nutmeg
cayenne pepper
black bean pepperokra
asparagu
mustard powder
�rmly brown sugar
balsamic vinaigrette dressing
chicken breastoyster
ditalini pasta
old bay seasoning tm
brown rice
process american cheese
chocolate
miso paste
pineapple
iceberg lettuce
pearl barley
oat
greek seasoning
biscuit
clove
browning sauce
chicken bouillon powder
green pea
bread dough
cream cheesepeanut butter chip
silken tofu
pineapple chip
sea scallop
ricotta cheese
papaya
red cabbage
egg substitute
zesty italian dressing
devil's food cake mix
bagel
sour mix
lamb
irish stout beer
sea salt
romaine lettuce
kalamata olive
salt
monosodium glutamate
rice wine
white potato
rum extract
grape jelly
crescent roll dough
beer
phyllo dough
fettuccine pasta
chili seasoning mix
biscuit mix
candy coated chocolate
green cabbage
ranch bean
cream of celery soup
apple pie �lling
caper
nectarine
white mushroom
banana
orange gelatin mix
1% buttermilk
apple jelly
dinner roll
sugar pumpkin
salad green
shrimp
cheese ravioli
chicken wing
sour cream
saltine
cornmeal
mixed vegetable
beef tenderloin
sherry
rotini pasta
mexican cheese blend
kosher salt black pepper
mayonnaise
lobster
white onion
chocolate cookie
white bread
french baguette
bread
vanilla frosting
anise seed
ranch dressing mix
wild rice
hot
canadian bacon
corn�akes cereal
wax bean
cantaloupe
non fat yogurt
lite whipped topping
spaghetti squash
egg roll wrapper
solid pack pumpkin
recipe pastry
asafoetida powder
co�ee powder
italian sauce
amaretto liqueur
shortening
turmeric
semolina �our
pomegranate juice
corned beef
skewer
shallotspanish onion
tapioca
provolone cheese
chile sauce
vanilla bean
chile pepperangel hair pasta
pumpkin
tilapia
brie cheese
cottage cheese
banana liqueur
lemon
smoked salmon
ginger paste
brown mustard
peanut butter
escarole
sour milk
olive oil
country pork rib
pastry shell
adobo seasoning
candy coated milk chocolate
curryghee
alfredo sauce
yellow cake mix
granny smith apple
beef chuck
chocolate hazelnut spread
maple syrup
squid
gingersnap cooky
raspberry gelatin
molasse
lemon cake mix
�sh stock
cook
grenadine syrup
pu� pastry
rum
grapefruit juice
tahini
black pepperbutternut squash
key lime juice
sirloin steak
macaroni
butter shortening
brown lentil
chicken broth
chili bean
pickling spice
yellow food coloring
great northern bean
mixed nut
green chile
salmon
english mu�n
co�ee liqueur
non fat milk powder
buttermilk
distilled white vinegar
golden syrup
powdered fruit pectin
green chily
grape
raspberry gelatin mix
low fat sour cream
topping
pineapple juice
red lettuce
orange zest
ketchup
chunk chicken
steak seasoning
sandwich roll
crystallized ginger
kosher salt
roma tomato
red bean
red candied cherry
sesame seed
beef stock
cashew
popped popcorn
apricot nectar
any fruit jam
processed cheese food
red pepper
coleslaw mix
white cake mix
cherry pie �lling
canola oil
whole wheat �our
honey
long grain
marinara sauce
yellow summer squash
to�ee baking bit
whole milk
trout
onion separated
low fat cream cheese
corn oil
oat bran
cream of potato soup
allspice berry
mandarin orange
cumin
saltine cracker
swiss chard
fenugreek seed
�sh sauce
eggplant
baby corn
cider vinegar
orange sherbet
debearded
beef bouillon
kernel corn
vanilla vodka
chicken leg quarter
mintfeta cheese
lime juice
raspberry jam
cooking oil
white corn
herb stu�ng mix
lemon lime soda
pork sausage
ziti pasta
orange marmalade
yogurt
bean
ginger garlic paste
crescent dinner roll
scallop
walnut oil
smoked ham
red food coloring
triple sec liqueur
fat free evaporated milk
walnutbaking chocolate
blueberry
caramel ice cream topping
bacon grease
fat free italian dressing
steak
�g
miracle whip ‚Ñ
potato starch
luncheon meat
brandy based orange liqueur
smoked paprika
pu� pastry shell
raspberry preserve
apple butter
tomato sauce
white rice
beef stew meat
taco seasoning mix
date
whipped topping
marshmallow
co�ee
butterscotch schnapp
red wine vinegar
orange
chicken thigh
mild italian sausage
blueberry pie �lling
yeast
lime peel
rice �our
chocolate cake mix
barbecue sauce
monterey jack cheese
halibut
beef round steak
seed
sour cherry
pork sparerib
orange roughy
barley nugget cereal
leek
maraschino cherry
chickpea
fettuccini pasta
orange juice
blue cheese dressing
yam
garam masala
black eyed pea
penne pasta
serrano chile pepper
�ourchive
marjoram
herb stu�ng
beef sirloin
beef
maple extract
bamboo shoot
lemon extract
meat tenderizer
kielbasa sausage
low sodium chicken broth
asparagus
cod
italian seasoning
lime gelatin
vegetable bouillon
andouille sausage
collard green
blackberry
beef gravy
green grape
tamari
fruit
malt vinegar
strawberry gelatin
lemon gelatin
green olive
poultry seasoning
prune
beef consomme
chili powder
dressing
fennel seed
gruyere cheese
jellied cranberry sauce
chipotle pepper
vanilla extract
apricot
linguini pasta
cranberry sauce
port wine
process cheese
cornish game hen
cilantro
green chile pepper
wheat
bread machine yeast
tube pasta
biscuit baking mix
cream corn
spinach
low fat whipped topping
irish cream liqueur
candy
zucchini
mild cheddar cheese
orange gelatin cornstarch
cheese
snow pea
low fat margarine
green candied cherry
vermouth
brandy
white grape juice
corn bread mix
broccoli �oret
vidalia onion
cocktail sauce
pickled jalapeno pepper
beaten egg
hamburger bun
black walnut
dill pickle juice
dill pickle relish
habanero pepper
white chocolate chip
veal
powdered non dairy creamer
lasagna noodle
gingerapricot jam
imitation crab meat
chicken soup base
white bean
tarragon
onion soup mix
thousand island dressing
red lentil
pancake mix
wheat germ
fat free mayonnaise
yukon gold potato
long grain rice
carrot
cauli�ower �oret
vegetable cooking spray
craw�sh tail
peppermint extract
brussels sprout
onion salt buttermilk biscuit
white kidney bean
mango chutney
black olive
meatless spaghetti sauce
curry powder
coriander
red snapper
biscuit dough
sausage
cheddar cheese soup
lettuce
pork loin roast
lemon pepper
red curry paste
egg noodle
hot sauce
raspberry vinegar
butter cooking spray
peach schnapp
eggspicy pork sausage
mixed fruit
cat�sh
venison
yellow pepper
carbonated water
pumpkin seed
new potato
lemon juice
chocolate pudding
watermelon
chicken breast half
gorgonzola cheese
buttery round cracker
apple pie spice
process cheese sauce
jasmine rice
lemon pudding mix
cooking sherry
strawberry preserve
french bread
toothpick
sauce
corn tortilla chip
garlic paste
salt free seasoning blend
elbow macaronipickle
cream of chicken soup
cardamom pod
persimmon pulp
chicken
liquid smoke
cocoa
pound cake
bell pepper
food coloring
coconut extract
chocolate chip
berry cranberry sauce
red bell pepper
seashell pasta
american cheese
oatmeal
sourdough bread
cornbread
mixed salad green
arugula
oil
parmesan cheeseclam juice
brick cream cheesecereal
italian parsley
milk chocolate
rice wine vinegar
hot dog bun
pistachio pudding mix
curd cottage cheese
garlic salt
chocolate cookie crust
orange extract
cream of mushroom soup
sa�ron
mushroom
tortilla chip
white hominy
green beans snapped
dill pickle
french onion soup
skim milk
tequila
�ax seed
low fat cheddar cheese
red wine
nut
apple cider
candied cherry
cheddar cheese
gingerroot
chocolate frosting
low fat yogurt
peppercorn
pepperoni
artichoke
baby pea
crisp rice cereal
potato chip
coconut cream
angel food cake mix
onion �ake
salad shrimp
taco seasoning
champagne
peach
low fat
yellow cornmeal
pork roast
baby spinach
portobello mushroom cap
blue cheese
strawberry gelatin mix
pink lemonade
chestnut
strawberry
oyster sauce
sugar snap pea
ka�r lime
anchovy
stu�ed olive
herb bread stu�ng mix
half and half
serrano pepper
coconut rum
red apple
cherry
�ank steak
round
peppermint candy
butter bean
almond
white vinegarcelery seed
corn syrupfat free cream cheese
cannellini bean
clam
mustard
scallion
potato �ake
parsley
fat free yogurt
pita bread round
red pepper �ake
onion
bourbon whiskey
creme de menthe liqueur
golden raisin
pancetta bacon
apple juice
egg white
fontina cheese
kale
asiago cheese
spiced rum
farfalle pasta
lobster tail
mirin
leg of lamb
tomato
zested
sauerkraut
unpie crust
bourbon
lean beef
tuna steak
wild rice mix
raisin
chocolate syrup
juice
cajun seasoning
cauli�owerwaterlemon yogurt
tapioca �our
vanilla yogurt
pimiento
hazelnut liqueur
thyme
part skim mozzarella cheese
mandarin orange segment
cinnamon
corn tortilla
crispy rice cereal
colby monterey jack cheese
apricot preserve
chipotle chile powder
swiss cheese
white wine
baking powdergraham cracker crust
vanilla wafer
lime
sugar based curing mixture
cream cheese spread
celeryolive
simple syrup
asian sesame oil
bacon bit
sharp cheddar cheese
rice vinegar
sea salt black pepper
curry paste
beef chuck roast
butter extract
pork loin
ginger ale
chicken leg
adobo sauce
lime zest
ham hock
watercres
pastry
seasoning
lentil
mascarpone cheese
baker's semisweet chocolate
acorn squash
chunk chicken breast
pepperoni sausage
brown sugar
fusilli pasta
kaiser roll
red delicious apple
honey mustard
unbleached �our
vinegar
spicy brown mustard
chuck roast
candied citron
vegetable combination
beef �ank steak
red chile pepper
avocado
quinoa
cake �our
whole wheat tortilla
dill seed
turnip
vegetable broth
sugarsugar cookie mix neufchatel cheese
coriander seed
apple
vegetable soup mix
chocolate sandwich cooky
colby cheese
sourdough starter
green bean
pecansoftened butter matzo meal
hash brown potato
vanilla pudding mix
pickle relish
noodlered potato
white chocolate
pistachio nut
green food coloring
lemon zest
chutney
splenda
buttermilk baking mix
caraway seedmaple �avoring
taco sauce
chili oil
kiwi
lean turkey
garlic
golden mushroom soup
grit
chili sauce
rosemary
green salsa
corkscrew shaped pasta
marshmallow creme
enchilada sauce
baby carrot
savory
cinnamon red candy
corn mu�n mix
black peppercorn
green bell pepperwater chestnut
french dressing
almond extract
rose water
paprika
english cucumber
nutritional yeast
unpie shell
ears corn
cream of shrimp soup
plum tomato
bratwurst
green lettuce
lemon lime carbonated beverage
ice
creole seasoning
grape juice
italian sausage
pizza crust
orzo pasta
white rum
crescent roll
italian cheese blend
rhubarb
chicken bouillon
prosciutto
cream
red onion
marinated artichoke heart
jalapeno chile pepper
tater tot
pork tenderloin
spaghetti
gin
semisweet chocolate
pie crust
cooking spray
spaghetti sauce
bread �our
butterscotch pudding mix
romano cheese
bulgur
hungarian paprika
white balsamic vinegar
picante sauce
meatball
tuna
chili without bean
bean sprout
baking cocoa
chile paste
butter
yellow mustard
haddock
sun�ower seed
processed american cheese
russet potato
allspice
giblet
button mushroom
peanut
kidney bean
portobello mushroom
ranch dressing
almond paste
hazelnut
beef brisket
sake
fruit cocktail
beef sirloin steak
pimento
honeydew melon
low fat milk
salami
german chocolate
pizza sauce
green tomato
orange liqueur
celery salt
chocolate mix
cranberry juice
white pepper
barley
soy milk
sweet
poblano pepper
macadamia nut
goat cheese
tomato soup
tea bag
mixed spice
low fat peanut butter
turkey breast
lemon peel
tomato vegetable juice cocktail
jalapeno pepper
low sodium soy sauce
processed cheese
limeade
arti�cial sweetener
sesame oil
heavy cream
fat free chicken broth
pork shoulder
evaporated milk
corn�ake
bay scallop
chocolate waferwhite sugar
rapid rise yeast potato
�our tortilla
chicken drum
chocolate ice cream
pepper jack cheese
baking potato
italian dressing mix
Figure 2: Ingredient complement network. Two ingredients share an edge if they occur together more thanwould be expected by chance and if their pointwise mutual information exceeds a threshold.
evident as a constellation of small nodes located near thetop of the sweet cluster in the visualization of Figure 2. Thecluster includes the following ingredients: lime, rum, ice,orange, pineapple juice, vodka, cranberry juice, lemonade,tequila, etc.
For each recipe we examined the minimum, average, andmaximum pairwise pointwise mutual information betweeningredients. The intuition is that complementary ingredi-ents would yield higher ratings, while ingredients that don’tgo together would lower the average rating. We found thatwhile the average and minimum pointwise mutual informa-tion between ingredients is uncorrelated with ratings, themaximum is very slightly positively correlated with the av-erage rating for the recipe (ρ = 0.09, p-value < 10−10). Thissuggests that having at least two complementary ingredientsvery slightly boosts a recipe’s prospects, but having clashingor unrelated ingredients does not seem to do harm.
6. RECIPE MODIFICATIONSCo-occurrence of ingredients aggregated over individual
recipes reveals the structure of cooking, but tells us littleabout how flexible the ingredient proportions are, or whethersome ingredients could easily be left out or substituted. Anexperienced cook may know that apple sauce is a low-fat al-ternative to oil, or may know that nutmeg is often optional,but a novice cook may implement recipes literally, afraidthat deviating from the instructions may produce poor re-sults. While a traditional hardcopy cookbook would providefew such hints, they are plentiful in the reviews submittedby users who implemented the recipes, e.g. “This is a greatrecipe, but using fresh tomatoes only adds a few minutes tothe prep time and makes it taste so much better”, or anothercomment about the same salsa recipe “This is by far the bestrecipe we have ever come across. We did however change itjust a little bit by adding extra onion.”
As the examples illustrate, modifications are reported evenwhen the user likes the recipe. In fact, we found that 60.1%of recipe reviews contain words signaling modification, suchas “add”,“omit”,“instead”,“extra” and 14 others. Further-
1 2 3 4 5
rating
prop
ortio
n of
revi
ews
with
giv
en ra
ting
0.0
0.1
0.2
0.3
0.4
0.5
0.6
no modificationwith modification
Figure 3: The modifiability of ingredients. The linerepresents equal number of occurrences where thereviews suggested to increase as opposed to increasethe amount of the ingredient in the dish.
more, it is the reviews that include changes that have astatistically higher average rating (4.49 vs. 4.39, t-test p-value < 10−10), and lower rating variance (0.82 vs. 1.05,Bartlett test p-value < 10−10), as is evident in the distribu-tion of ratings, shown in Fig. 3. This suggests that flexibilityin recipes is not necessarily a bad thing, and that reviewerswho don’t mention modifications are more likely to think ofthe recipe as perfect, or to dislike it entirely.
In the following, we describe the recipe modifications ex-tracted from user reviews, including adjustment, deletionand addition. We then present how we constructed an in-gredient substitute network based on the extracted informa-tion.
6.1 AdjustmentsSome modifications involve increasing or decreasing the
0.01 0.02 0.05 0.10 0.20 0.50 1.00
0.01
0.02
0.05
0.10
0.20
0.50
1.00
(# reviews adjusting down)/(# recipes)
(# re
view
s ad
just
ing
up)/(
# re
cipe
s)
salt
butter
egg
flour
white sugarwateronion
garlic
milk
vanilla extract
pepper olive oil
vegetable oil
brown sugar
black peppersugar
cinnamon
tomato
margarine
baking powderbaking soda
lemon juice
parsley
cs’. sugar
parmesan
celery
cream cheese
green bell pepper
carrot
walnut
cheddar
sour cream
garlic powderchicken breast
nutmegbasil
pecan
mushroom
mayonnaise
chicken broth
potato
soy sauce
oregano
cornstarch
shortening
honeychocolate chipbacon
worcestershire s.
Figure 4: Modifications to the 50 most common in-gredients, derived from recipe reviews. The linedenotes equal numbers of suggested increases anddecreasess.
amount of an ingredient in the recipe. In this and the fol-lowing analyses, we split the review on punctuation suchas commas and periods. We used simple heuristics to de-tect when a review suggested a modification: adding/usingmore/less of an ingredient counted as an increase/decrease.Doubling or increasing counted as an increase, while reduc-ing, cutting, or decreasing counted as a decrease. While it islikely that there are other expressions signaling the adjust-ment of ingredient quantities, using this set of terms allowedus to compare the relative rate of modification, as well asthe frequency of increase vs. decrease between ingredients.The ingredients themselves were extracted by performing amaximal character match within a window following an ad-justment term.
Figure 4 shows the ratios of the number of reviews sug-gesting modifications, either increases or decreases, to thenumber of recipes that contain the ingredient. Two patternsare immediately apparent. Ingredients that may be per-ceived as being unhealthy, such as fats and sugars, are, withthe exception of vegetable oil and margarine, more likelyto be modified, and to be decreased. On the other hand,flavor enhancers such as soy sauce, lemon juice, cinnamon,Worcestershire sauce, and toppings such as cheeses, baconand mushrooms, are also likely to be modified; however, theytend to be added in greater, rather than lesser quantities.Combined, the patterns suggest that good-tasting but “un-healthy” ingredients can be reduced, if desired, while spices,extracts, and toppings can be increased to taste.
6.2 Deletions and additionsRecipes are also frequently modified such that ingredients
are omitted entirely. We looked for words indicating thatthe reviewer did not have an ingredient (and hence did notuse it), e.g. “had no” and “didn’t have”. We further used“omit/left out/left off/bother with” as indication that thereviewer had omitted the ingredients, potentially for otherreasons. Because reviewers often used simplified terms, e.g.“vanilla” instead of “vanilla extract”, we compared words inproximity to the action words by constructing 4-character-
grams and calculating the cosine similarity between the n-grams in the review and the list of ingredients for the recipe.
To identify additions, we simply looked for the word“add”,but omitted possible substitutions. For example, we woulduse “added cucumber”, but not “added cucumber instead ofgreen pepper”, the latter of which we analyze in the follow-ing section. We then compared the addition to the list ofingredients in the recipe, consider the addition valid only ifthe ingredient does not already belong in the recipe.
Table 6.2 shows the correlation of the ingredient modifica-tions. As might be expected, the more frequently an ingre-dient occurs in a recipe, the more times its quantity has theopportunity to be modified, as is evident in the strong corre-lation between the recipe frequency and both increases anddecreases recommended in reviews. However, if we take theproportion of modifications to the number of recipes the in-gredient appears in, these are typically negatively correlatedwith the frequency of the ingredient, e.g. deletions/recipewith ρ = −0.22, additions ρ = −0.25, increases ρ = −0.26.For example, salt is so essential, appearing in over 21,000recipes, that we detected only 18 reviews where it was ex-plicitly dropped. In contrast, Worcheshire sauce, appearingin 1,542 recipes, is dropped explicitly in 148 reviews.
As might also be expected, additions are positively corre-lated with increases, and deletions with decreases. However,additions and deletions are very weakly negatively corre-lated, indicating that an ingredient that is added frequentlyis not necessarily omitted more frequently as well.
Table 2: Correlations between ingredient modifica-tions
addition deletion increase decreaserecipes 0.41 0.22 0.61 0.68
addition -0.15 0.79 0.11deletion 0.09 0.58increase 0.39
6.3 Ingredient substitute networkReplacement relationships show whether one ingredient
is preferable to another. The preference could be basedon taste, availability, or price. Some ingredient substitu-tion tables can be found online 1, but are neither extensivenor contain information about relative frequencies of eachsubstitutuion. Thus, we found an alternative source for ex-tracting replacement relationships – users’ comments, e.g.“I replaced the butter in the frosting by sour cream, just tosoothe my conscience about all the fatty calories”.
To extract such knowledge, we first parsed the reviewsas follows: we considered several phrases to signal replace-ment relationships: “replace a with b”, “substitute a withb”, “a instead of b”, etc, and matched a and b to our list ofingredients.
We constructed an ingredient substitute network to cap-ture users’ knowledge about ingredient replacement. Thisweighted, directed network consists of ingredients as nodes.We thresholded and eliminated any suggested substitutionsthat occurred fewer than 5 times. We then determined theweight of each edge by p(b|a), the proportion of substitu-
1e.g., http://allrecipes.com/HowTo/common-ingredient-substitutions/detail.aspx
Figure 5: The network of ingredient substitution.Nodes are sized according to the number of timesthey have been recommended as a substitute for an-other ingredient in reviews, and colored according totheir indegree.
tions of ingredient a that suggest ingredient b. For example,68% of substitutions for white sugar were to splenda, anartificial sweetener, and hence the assigned weight for thesugar → splenda edge is 0.68. The resulting network isshown in Figure 5.
The substitution network shown in Fig. 5 exhibits strongclustering. We examined this structure by applying the mapgenerator tool by Rosvall et al. [13], which uses a randomwalk approach to identify clusters in weighted, directed net-works. The resulting clusters, and their relationships to oneanother, are shown in Fig. 6. The derived clusters could beused when following a relatively new recipe which may nothave many reviews, and therefore many suggestions for in-gredient substitutions. If one does not have all ingredientsat hand, one could examine the content of one’s fridge andpanty and match it with other ingredients found in the samecluster as the ingredient called for by the recipe. Table 6.3lists the contents of a few such sample ingredient clusters,and Fig. 7 shows two example clusters extracted from thesubstitute network.
Finally, we examined whether the substitution networkencodes preferences for one ingredient over another, as evi-denced by the relative ratings of similar recipes, one whichcontains an original ingredient, and another which imple-ments a substitution. To test this hypothesis, we constructa “preference network”, where one ingredient is preferred toanother in terms of received ratings, and is constructed bycreating an edge (a, b) between a pair of ingredients, wherea and b are listed in two recipes X and Y respectively, ifrecipe ratings RX > RY . For example, if recipe X includesbeef, ketchup and cheese, and recipe Y contains beef andpickles, then this recipe pair contributes to two edges: onefrom pickles to ketchup, and the other from pickles to cheese.The aggregate edge weights are defined based on PMI. Be-cause PMI is a symmetric quantity (pmi(a; b) = pmi(b; a)),
Table 3: Clusters of ingredients that can be substi-tuted for one another. A maximum of 5 additionalingredients for each cluster are listed, ordered byPageRank.
main other ingredients
chicken turkey, beef, sausage, chicken breast, baconolive oil butter, apple sauce, oil, banana, margarine
sweet yam, potato, pumpkin, butternut squash,potato parsnipbaking baking soda, cream of tartarpowderalmond pecan, walnut, cashew, peanut, sunflower s.
apple peach, pineapple, pear, mango, pie fillingegg egg white, egg substitute, egg yolk
tilapia cod, catfish, flounder, halibut, orange roughyspinach mushroom, broccoli, kale, carrot, zucchiniitalian basil, cilantro, oregano, parsley, dill
seasoningcabbage coleslaw mix, sauerkraut, bok choy
napa cabbage
chicken,..
tilapia,..
italian seasoning,..seasoning,..
onion,..
garlic,..chicken broth,..
milk,..
sour cream,..
honey,..
olive oil,..
spinach,..
bread,..apple,..
sweet potato,..
cinnamon,..
black bean,..
flour,..
tomato,..
sauce,..
lemon juice,..
pepper,..brown rice,..
white wine,..
strawberry,..
spaghetti sauce,..
almond extract,..vanilla,..
cheese,..
almond,..
chocolate chip,..
baking powder,..
cream of mushroom soup,..
egg,..
cranberry,..pie crust,..
cabbage,..
celery,..
champagne,..
coconut milk,..
corn chip,..
sea scallop,..
apple juice,..
hoagie roll,..
iceberg lettuce,..
cottage cheese,..
golden syrup,.. black olive,..
pickle,..
red potato,..
quinoa,..
graham cracker,..
lemon cake mix,..
imitation crab meat,..
peach schnapp,..
hot,..
vegetable shortening,..pumpkin seed,..
lemonade,..
curry powder,..
dijon mustard,..
sugar snap pea,..
smoked paprika,..
Figure 6: Ingredient substitution clusters. Nodesrepresent clusters and edges indicate the presence ofrecommended substitutions that span clusters. Eachcluster represents a set of related ingredients whichare frequently substituted for one another
we introduce a directed PMI measure to cope with the di-rectionality of the preference network:
pmi(a→ b) = logp(a→ b)
p(a)p(b),
where
p(a→ b) =# of recipe pairs from a to b
# of recipe pairs,
and p(a), p(b) are defined as in the previous section.Comparing the substitution network with this preference
network, we found high correlations between the two net-
milkheavy whipping cream
whole milk
skim milk
whipping cream
heavy cream
buttermilk
soy milk
half and half
evaporated milk
cream
cinnamon
ginger
pumpkin pie spicecardamom
nutmeg allspice
ginger root
clove
mace
(a) milk substitutes (b) cinammon substitutes
Figure 7: Relationships between ingredients locatedwithin two of the clusters from Fig. 6.
works (ρ = 0.72, p < 0.001). This observation suggests thatthe substitute network encodes users’ ingredient preference,which we will use in the recipe prediction task described inthe next section.
7. RECIPE RECOMMENDATIONWe use the above insights to uncover novel recommen-
dation algorithms suitable for recipe recommendations. Weuse ingredients and the relationships encoded between themin ingredient networks as our main feature sets to predictrecipe ratings, and compare them against features encod-ing nutrition information, as well as other baseline featuressuch as cooking methods, and preparation and cook time.Then we apply a discriminative machine learning method,stochastic gradient boosting tree [7], to predict recipe rat-ings.
In the experiments, we seek to answer three questions use-ful for recipe recommendation: (1) Can we predict users’preference for a new recipe given the information present inthe recipe? (2) What are the key aspects that determineusers’ preference? (3) Does the structure of ingredient net-works help in recipe recommendation, and how?
We shall answer these questions through a prediction task.
7.1 Recipe Pair PredictionThe goal of our prediction task is: given a pair of similar
recipes, determine which one has higher average rating thanthe other. This task is designed particularly to help userswith a specific dish or meal in mind, and who are trying todecide between several recipe options for that dish.
Recipe pair data. The data for this prediction taskconsists of pairs of similar recipes. The reason for select-ing similar recipes, with high ingredient overlap, is thatwhile apples may be quite comparable to oranges in thecontext of recipes, especially if one is evaluating salads ordesserts, lasagna may not be comparable to a mixed drink.To derive pairs of related recipes, we computed similaritywith a cosine similarity between the ingredient lists for thetwo recipes, weighted by the inverse document frequency,log(# of recipes/# of recipes containing the ingredient).We considered only those pairs of recipes whose cosine sim-ilarity exceeded 0.2. The weighting is intended to identifyhigher similarity among recipes sharing more distinguishingingredients, such as Brussels sprouts, as opposed to recipessharing very common ones, such as butter.
A further challenge to obtaining reliable relative rankingsof recipes is variance introduced by having different userschoose to rate different recipes. In addition, some usersmight not have a sufficient number of reviews under their
belt to have calibrated their own rating scheme. To con-trol for variation introduced by users, we examined recipepairs where the same users are rating both recipes and arecollectively expressing a preference for one recipe over an-other. Specifically, we generated 62,031 recipe pairs (a, b)where (ratingi(a) > ratingi(b), for at least 10 users i, andover 50% of users who rated both recipe a and recipe b. Fur-thermore, each user i should be an active enough reviewerto have rated at least 8 other recipes.
Features. In the prediction dataset, each observationconsists of a set of predictor variables or features that rep-resent information about two recipes, and the response vari-able is a binary indicator of which gets the higher rating onaverage. To study the key aspects of recipe information, weconstructed different set of features, including:
• Baseline: This includes cooking methods, such as chop-ping, marinating, or grilling, and cooking effort de-scriptors, such as preparation time in minutes, as wellas the number of servings produced, etc. These fea-tures are considered as primary information about arecipe and will be included in all other feature setsdescribed below.
• Full ingredients: We selected up to 1000 popular ingre-dients to build a “full ingredient list”. In this featureset, each observed recipe pair contains a vector withentries indicating whether an ingredient in the full listis present in either recipe in the pair.
• Nutrition: This feature set does not include any in-gredients but only nutrition information such the totalcaloric content, as well as amount of fats, carbohy-drates, etc.
• Ingredient networks: In this set, we replaced the fullingredient list by structural information extracted fromdifferent ingredient networks, as described in section 5and 6.3. Co-occurrence is treated separately as a rawcount, and a complementarity, captured by the PMI.
• Combined set: Finally, a combined feature set is con-structed to test the performance of a combination offeatures, including baseline, nutrition and ingredientnetworks.
To build the ingredient network feature set, we extractedthe following two types of structural information from theco-occurrence and substitution networks, as well as the com-plement network derived from the co-occurrence informa-tion:
Network positions are calculated to represent how a recipe’singredients occupy positions within the networks. Such po-sition measures are likely to inform if a recipe contains any“popular” or “unusual” ingredients. To calculate the posi-tion measures, we first calculated various network centralitymeasures, including degree centrality, betweenness central-ity, etc., from the ingredient networks. A centrality measurecan be represented as a vector ~g where each entry indicatesthe centrality of an ingredient. The network position of arecipe, with its full ingredient list represented as a binary
vector ~f , can be summarized by ~gT · ~f , i.e., an aggregatedcentrality measure based on the centrality of its ingredients.
Network communities provide information about whichingredient is more likely to co-occur with a group of otheringredients in the network. A recipe consisting of ingredientsthat are frequently used with, complemented by or substi-tuted by certain groups may be predictive of the ratings
baseline
full ingredients
nutrition
ing. networks
combined
Accuracy
0.60
0.65
0.70
0.75
0.80
Figure 8: Prediction performance. The nutritioninformation and ingredient networks are more ef-fective features than full ingredients. The ingredi-ent network features lead to impressive performanceclose to the best performance, indicating the powerof network structures in recipe recommendation.
feature
impo
rtan
ce
0.0
0.2
0.4
0.6
0.8
1.0
20 40 60 80 100
group
nutrition (6.5%)
cook effort (5.0%)
ing. networks (84%)
cook methods (3.9%)
Figure 9: Relative importance of features in thecombined set. The individual items from nutri-tion information are very indicative in differentiat-ing high-rated recipes, while most of the predictionpower comes from ingredient networks.
the recipe will receive. To obtain the network communityinformation, we applied latent semantic analysis (LSA) onrecipes. We first factorized each ingredient network, rep-resented by matrix W , using singular value decomposition(SVD). In the matrix W , each entry Wij indicated whetheringredient i co-occurrs, complements or substitues ingredi-ent j.
Suppose Wk = UkΣkVTk is a rank-k approximation of W ,
we can then transform each recipe’s full ingredient list using
the low-dimensional representation, Σ−1k V T
k~f , as community
information within a network. These low-dimensional vec-tors, together with the vectors of network positions, consti-tute the ingredient network features.
Learning method. We applied discriminative machinelearning methods such as support vector machines (SVM) [2]and stochastic gradient boosting trees [6] to our predictionproblem. Here we report and discuss the detailed resultsbased on the gradient boosting tree model. Like SVM, thegradient boosting tree model seeks a parameterized classi-fier, but unlike SVM that considers all the features at one
feature
impo
rtan
ce
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
20 40 60 80 100
network
substitution (39.8%)
co−occurrence (30.9%)
complement (29.2%)
Figure 10: Relative importance of features repre-sented the network structure. The substitution net-work has stronger contribution (39.8%) to the totalimportance of ingredient network features than theother two networks, and it also has more influen-tial features in the top 100 list, which suggests theinformation about substitution network is comple-mentary to other features.
time, the boosting tree model considers a set of featuresat a time and iteratively combines them according to theirempirical errors. In practice, it not only has competitiveperformance comparable to SVM, but can serve as a featureranking procedure [11].
In this work, we fitted a stochastic gradient boosting treemodel with 8 terminal nodes under an exponential loss func-tion. The dataset is roughly balanced in terms of whichrecipe is the higher-rated one within a pair. We randomlydivided the dataset into a training set (2/3) and a testingset (1/3). The prediction performance is evaluated based onaccuracy, and the feature performance is evaluated in termsof relative importance [9]. For each single decision tree, oneof the input variables, xj , is used to partition the region as-sociated with that node into two subregions in order to fitto the response values. The squared relative importance ofvariable xj is the sum of such squared improvements overall internal nodes for which it was chosen as the splittingvariable, as:
imp(j) =∑k
i2kI(splits on xj)
where i2k is the empirical improvement by the k-th nodesplitting on xj at that point.
7.2 ResultsThe overall prediction performance is shown in Fig. 8.
Surprisingly, even with a full list of ingredients, the pre-diction accuracy is only improved from .712 (baseline) to.746. In contrast, the nutrition information and ingredientnetworks are more effective (with accuracy .753 and .786, re-spectively). Both of them have much lower dimensions (fromtens to several hundreds), compared with the full ingredientsthat are represented by more than 2000 dimensions (1000ingredients per recipe in the pair). The ingredient networkfeatures lead to impressive performance close to the bestperformance given by the combined set (.792), indicatingthe power of network structures in recipe recommendation.
Figure 9 shows the influence of different features in the
feature
impo
rtan
ce
0.0
0.2
0.4
0.6
0.8
1.0
2 4 6 8 10 12
nutrition
carbs (20.9%)
cholesterol (17.7%)
calories (19.7%)
sodium (16.8%)
fiber (12.3%)
fat (12.4%)
Figure 11: Relative importance of features from nu-trition information. The carbs item is the most in-fluential feature in predicting higher-rated recipes.
combined feature set. Up to 100 features with the highestrelative importance are shown. The importance of a featuregroup is summarized by how much the total importance iscontributed by all features in the set. For example, the base-line consisting of cooking effort and cooking methods havecontributed 8.9% to the overall performance. The individ-ual items from nutrition information are very indicative indifferentiating highly-rated recipes, while most of the pre-diction power comes from ingredient networks (84%).
Figure 10 shows the top 100 features from the three net-works. In terms of the total importance of ingredient net-work features, the substitution network has slightly strongercontribution (39.8%) than the other two networks, and italso has more influential features in the top 100 list. Thissuggests that the structural information extracted from thesubstitution network is not only important but also comple-mentary to information from other aspects.
Looking into the nutrition information (Fig. 11), we foundthat carbohydrates are the most influential feature in pre-dicting higher-rated recipes. Since carbohydrates comprisearound 50% or more of total calories, the high importanceof this feature interestingly suggests that a recipe’s ratingcan be influenced by users’ concerns about nutrition anddiet. Another interesting observation is that, while individ-ual nutrition items are powerful predictors, a higher predic-tion accuracy can be reached by using ingredient networksalone, as shown in Fig. 8. This implies the informationabout nutrition may have been encoded in the ingredientnetwork structure, e.g. substitutions of less healthful ingre-dients with “healthier” alternatives.
Constructing the ingredient network feature involves re-ducing high-dimensional network information through SVD,as described in the previous section. The dimensionality canbe determined by cross-validation. As shown in Fig. 12, fea-tures with a very large dimension tend to overfit the trainingdata. Hence we chose k = 50 for the reduced dimension ofall three networks. The figure also shows that using theinformation about the complement network alone is moreeffective in prediction than using other the co-occurrenceand substitute networks, even in the case of low dimensions.However, as shown in terms of relative importance (Fig. 10),the substitution network alone is not the most effective, butit provides more complementary information in the com-bined feature set.
Dimensions
Acc
urac
y
0.76
0.77
0.78
0.79
0.80
●
●●
●
●●
●
10 20 30 40 50 60 70
network
● combined
substitution
complement
co−occurrence
Figure 12: Prediction performance over reduced di-mensionality. The dimensionality of network fea-tures can be determined by cross-validation. Thebest performance is given by reduced dimensionk = 50 when combining all three networks. In ad-dition, using the information about the complementnetwork alone is more effective in prediction thanusing other two networks.
4
8
43
19
6
chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlim
e juicelem
on extractchocolate puddingalm
ond extractcream
of chicken soupbeefalm
ondkalevanillavanilla extractevaporated m
ilksour creambutterm
ilkchicken brothhalf and halfm
ilkbrow
n sugarbutterhoneyapplesauceolive oilsplenda
−0.5Value
Color Key
4 8 43 19 6
chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlime juicelemon extractchocolate puddingalmond extractcream of chicken soupbeefalmondkalevanillavanilla extractevaporated milksour creambuttermilkchicken brothhalf and halfmilkbrown sugarbutterhoneyapplesauceolive oilsplenda
−0.5 0.5Value
Color Key
ingredient
svd dimension
1
2
3
4
5
Figure 13: Influential substitution communities.The matrix shows the most influential feature di-mensions extracted from the substitution network.For each dimension, the six representative ingredi-ents with the highest intensity values in the decom-posed matrix are shown, with colors indicating theirintensity. These features suggest that the communi-ties of ingredient substitutes, such as the sweet andoil in the first dimension, are particularly informa-tive in prediction.
In Fig. 13 we show the most representative ingredients inthe decomposed matrix derived from the substitution net-work. We display the top five influential dimensions, evalu-ated based on the relative importance, from the SVD resul-tant matrix Vk, and in each of these dimensions we extractedsix representative ingredients based on their intensities in
the dimension (the squared entry values). These reprsenta-tive ingredients suggest that the communities of ingredientsubstitutes, such as the sweet and oil substitutes in the firstdimension or the milk substitutes in the second dimesion(which is similar to the cluster shown in Fig. 6), are partic-ularly informative in predicting recipe ratings.
To summarize our observations from the experiments, wefound we were able to effectively predict users’ preference fora recipe, but the prediction is not through using a full listof ingredients. Instead, by using the structural informationextracted from the relationships among ingredients, we canbetter uncover users’ preference about recipes.
8. CONCLUSIONRecipes are little more than instructions for combining
and processing sets of ingredients. Individual cookbooks,even the most expansive ones, contain single recipes for eachdish. The web, however, permits collaborative recipe gen-eration and modification, with tens of thousands of recipescontributed in individual websites. We have shown how thisdata can be used to glean insights about regional preferencesand modifiability of individual ingredients, and also how itcan be used to construct two kinds of networks, one of in-gredient complements, the other of ingredient substitutes.These networks encode which ingredients go well together,and which can be substituted to obtain superior results, andpermit one to predict, given a pair of related recipes, whichone will be more highly rated by users.
In future work, we plan to extend ingredient networks toincorporate the cooking methods as well. It would also beof interest to generate region-specific and diet-specific rat-ings, depending on the users’ background and preferences.A whole host of user-interface features could be added forusers who are interacting with recipes, whether the recipeis newly submitted, and hence unrated, or whether they arebrowsing a cookbook. In addition to automatically predict-ing a rating for the recipe, one could flag ingredients thatcan be omitted, ones whose quantity could be tweaked, aswell as suggested additions and substitutions.
9. ACKNOWLEDGMENTSThis work was supported by MURI award FA9550-08-1-
0265 from the Air Force Office of Scientific Research. Themethodology used in this paper was developed with sup-port from funding from the Army Research Office, Multi-University Research Initiative on Measuring, Understand-ing, and Responding to Covert Social Networks: Passiveand Active Tomography.
10. REFERENCES[1] S. M. Boback, C. L. Cox, B. D. Ott, R. Carmody,
R. W. Wrangham, and S. M. Secor. Cooking andgrinding reduces the cost of meat digestion.Comparative Biochemistry and Physiology - Part A:Molecular and Integrative Physiology, 148(3):651 –656, 2007.
[2] C. Cortes and V. Vapnik. Support-vector networks.Machine learning, 20(3):273–297, 1995.
[3] P. Forbes and M. Zhu. Content-boosted matrixfactorization for recommender systems: Experimentswith recipe recommendation. Proceedings ofRecommender Systems, 2011.
[4] J. Freyne and S. Berkovsky. Intelligent food planning:personalized recipe recommendation. In IUI, pages321–324. ACM, 2010.
[5] J. Freyne and S. Berkovsky. Recommending food:Reasoning on recipes and ingredients. User Modeling,Adaptation, and Personalization, pages 381–386, 2010.
[6] J. Friedman. Stochastic gradient boosting.Computational Statistics & Data Analysis,38(4):367–378, 2002.
[7] J. Friedman, T. Hastie, and R. Tibshirani. Additivelogistic regression: a statistical view of boosting.Annals of Statistics, 28:2000, 1998.
[8] G. Geleijnse, P. Nachtigall, P. van Kaam, andL. Wijgergangs. A personalized recipe advice systemto promote healthful choices. In IUI, pages 437–438.ACM, 2011.
[9] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin.The elements of statistical learning: data mining,inference and prediction. The MathematicalIntelligencer, 27(2), 2005.
[10] F. Kamieth, A. Braun, and C. Schlehuber. Adaptiveimplicit interaction for healthy nutrition and foodintake supervision. Human-Computer Interaction.Towards Mobile and Intelligent InteractionEnvironments, pages 205–212, 2011.
[11] Y. Lu, F. Peng, X. Li, and N. Ahmed. Couplingfeature selection and machine learning methods fornavigational query identification. In CIKM, pages682–689. ACM, 2006.
[12] I. Rombauer, M. Becker, E. Becker, and L. Maestro.Joy of cooking. Scribner Book Company, 1997.
[13] M. Rosvall and C. Bergstrom. Maps of random walkson complex networks reveal community structure.PNAS, 105(4):1118, 2008.
[14] Y. Shidochi, T. Takahashi, I. Ide, and H. Murase.Finding replaceable materials in cooking recipe textsconsidering characteristic cooking actions. In Proc. ofthe ACM multimedia 2009 workshop on Multimedia forcooking and eating activities, pages 9–14. ACM, 2009.
[15] M. Svensson, K. Hook, and R. Coster. Designing andevaluating kalas: A social navigation system for foodrecipes. ACM Transactions on Computer-HumanInteraction (TOCHI), 12(3):374–400, 2005.
[16] M. Ueda, M. Takahata, and S. Nakajima. User’s foodpreference extraction for personalized cooking reciperecommendation. Proc. of the Second Workshop onSemantic Personalized Information Management:Retrieval and Recommendation, 2011.
[17] L. Wang, Q. Li, N. Li, G. Dong, and Y. Yang.Substructure similarity measurement in chineserecipes. In WWW, pages 979–988. ACM, 2008.
[18] Wikipedia. Outline of food preparation, 2011. [Online;accessed 22-Oct-2011].
[19] R. Wrangham. Catching fire: how cooking made ushuman. Profile Books, 2010.
[20] Q. Zhang, R. Hu, B. Mac Namee, and S. Delany. Backto the future: Knowledge light case base cookery. InProc. of The 9th European Conference on Case-BasedReasoning Workshop, page 15, 2008.