Recipe recommendation using ingredient networksladamic/papers/recipe.pdf · 2011. 10. 22. ·...

Recipe recommendation using ingredient networks

Chun-Yuen TengSchool of InformationUniversity of MichiganAnn Arbor, MI, USA

[email protected]

Yu-Ru LinIQSS, Harvard University

CCS, Northeastern UniversityBoston, MA

[email protected]

Lada A. AdamicSchool of InformationUniversity of MichiganAnn Arbor, MI, USA

[email protected]

ABSTRACTThe recording and sharing of cooking recipes, a human ac-tivity dating back thousands of years, naturally became anearly and prominent social use of the web. The resultingonline recipe collections are repositories of ingredient com-binations and cooking methods whose large-scale and vari-ety yield interesting insights about both the fundamentals ofcooking and user preferences. These insights include pref-erences for cooking methods depending on the nutritionalvalue extracted from food, and the geographic region fromwhich the recipe originates. At the level of an individual in-gredient we measure whether it tends to be essential or canbe dropped or added, and whether its quantity can be modi-fied. We also construct two types of networks to capture therelationships between ingredients. The complement networkcaptures which ingredients tend to co-occur frequently, andis composed of two large communities: one savory, the othersweet. The substitute network, derived from user generatedsuggestions for modifications, can be decomposed into manycommunities of functionally equivalent ingredients, and cap-tures users’ preference for healthier variants a recipe. Ourexperiments reveal that recipe ratings can be well predictedwith features derived from combinations of ingredient net-works and nutrition information.

Categories and Subject DescriptorsH.2.8 [Database Management]: Database applications—Data mining

General TermsMeasurement; Experimentation

Keywordsingredient networks, recipe recommendation

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

1. INTRODUCTIONThe web enables individuals to collaboratively share knowl-

edge and recipe websites are one of the earliest examples ofcollaborative knowledge sharing on the web. Allrecipes.com,the subject of our present study, was founded in 1997, yearsahead of other collaborative websites such as the Wikipedia.Recipe sites thrive because individuals are eager to sharetheir recipes, from family recipes that had been passed downfor generations, to new concoctions that they created thatafternoon, having been motivated in part by the ability toshare the result online. Once shared, the recipes are imple-mented and evaluated by other users, who supply ratingsand comments.

The desire to look up recipes online may at first appearodd given that tombs of printed recipes can be found in al-most every kitchen. The Joy of Cooking [12] alone contains4,500 recipes spread over 1,000 pages. There is, however,substantial additional value in online recipes, beyond be-ing able to look them up. While the Joy of Cooking con-tains a single recipe for Swedish meatballs, allrecipes.comhas “Swedish Meatballs I”, “II”, and “III”, submitted by dif-ferent users, along with 4 other variants, including “TheAmazing Swedish Meatball”. Each variant has been re-viewed, from 329 reviews for “Swedish Meatballs I” to 5reviews for “Swedish Meatballs III”. The reviews not onlyprovide a crowd-sourced ranking of the different recipes, butalso many suggestions on how to modify them, e.g. usingground turkey instead of beef, skipping the“cream of wheat”because it is rarely on hand, etc.

The wealth of information captured by online collabora-tive recipe sharing sites is revealing not only of the fun-damentals of cooking, but also user preferences. The co-occurrence of ingredients in tens of thousands of recipes pro-vides information about which ingredients go well together,and when a pairing is unusual. Users’ reviews provide cluesas to the flexibility of a recipe, and the ingredients withinit. Can the amount of cinnamon be doubled? Can the nut-meg be omitted? If one is lacking a certain ingredient, can asubstitute be found among supplies at hand without a tripto the grocery store? Unlike cookbooks, which will containvetted but perhaps not the best variants for some individu-als’ tastes, ratings assigned to user-submitted recipes allowfor the evaluation of what works and what does not.

In this paper, we seek to distill the collective knowlege andpreference about cooking through mining a popular recipe-sharing website. To extract such information, we first parsethe unstructured texts from the recipes and users’ reviews.We construct two types of networks that reflect different re-

lationships between ingredients, in order to capture users’knowledge about how to combine ingredients. The comple-ment network captures which ingredients tend to co-occurfrequently, and is composed of two large communities: onesavory, the other sweet. The substitute network, derivedfrom user generated suggestions for modifications, can bedecomposed into many communities of functionally equiva-lent ingredients, and captures users’ preference for healthiervariants a recipe. Our experiments reveal that recipe ratingscan be well predicted by features derived from combinationsof ingredient networks and nutrition information (with ac-curacy .792), while most of the prediction power comes fromthe ingredient networks (84%).

The rest of the paper is organized as follows. Section 2 re-views the related work. Section 3 describes the dataset andsection 4 provides descriptive analysis of the data. Section 5discusses the extraction of ingredient complement networkand the characteristics of this network. Section 6 presentsthe extraction of recipe modification information, as well asthe construction and characteristics of the ingredient substi-tute network. Section 7 presents our experiments on reciperecommendation and section 8 concludes.

2. RELATED WORKRecipe recommendation has been the subject of much

prior work. Typically the goal has been to suggest recipes tousers based on their past recipe ratings [5][15][3] or brows-ing/cooking history [16]. The algorithms then find simi-lar recipes based on overlapping ingredients, either treatingeach ingredient equally [4] or according to the focus of therecipe [20]. For example, for “grilled chicken with basildressing”, chicken is assigned a higher weight than basil. In-stead of modeling recipes using ingredients, Wang et al. [17]represent the recipes as graphs which are built on ingre-dients and cooking directions, and they demonstrate thatgraph representations can be used to easily aggregate Chi-nese dishes by the flow of cooking steps and the sequence ofadded ingredients. However, their approach only models theoccurrence of ingredients or cooking methods, and doesn’ttake into account the relationships between ingredients. Incontrast, in this paper we incorporate the likelihood of ingre-dients to co-occur, as well as the potential of one ingredientto act as a substitute for another.

Another branch of research has focused on recommend-ing recipes based on desired nutritional intake or promotinghealthy food choices. Geleijnse et al. [8] designed a proto-type of a personalized recipe advice system, which suggestsrecipes to users based on their past food selections and nutri-tion intake. In addition to nutrition information, Kamieth etal. [10] built a personalized recipe recommendation systembased on availability of ingredients and personal nutritionalneeds. Shidochi et al. [14] proposed an algorithm to extractreplaceable ingredients from recipes in order to satisfy users’various demands, such as calorie constraints and food avail-ability. Their method identifies substitutable ingredients bymatching the cooking actions that correspond to ingredientnames. However, their assumption that substitutable ingre-dients are subject to the same processing methods is less di-rect and specific than extracting substitutions directly fromuser-contributed suggestions. In the present paper, we usea data-driven approach to construct a substitute ingredi-ent network, and derive clusters of substitutable ingredientsfrom this network.

3. DATASETAllrecipes is one of the most popular recipe-sharing web-

sites, where novice and expert cooks alike can upload andrate cooking recipes. Since its launch in 1997, it has re-ceived more than 535 milliion annual visits from over 5 mil-lion members. It hosts 16 customized international sitesfor users to share their recipes in their native languages.Recipes uploaded to the site contain specific instructionson how to prepare a dish: the list of ingredients, prepara-tion steps, preparation and cook time, the number of serv-ings produced, nutrition information, serving directions, andphotos of the prepared dish. The uploaded recipes are en-riched with user ratings and reviews, which comment onthe quality of the recipe, and suggest changes and improve-ments. In addition to rating and commenting on recipes,users are able to save them as favorites or recommend themto others through a forum.

We downloaded 46,337 recipes including all informationlisted from allrecipes.com, including several classifications,such as a region (e.g. the midwest region of US or Eu-rope), the course or meal the dish is appropriate for (e.g.:appetizers or breakfast), and any holidays the dish may beassociated with. In order to understand users’ recipe prefer-ences, we crawled 1,976,920 reviews which include reviewers’ratings, review text, and the number of users who voted thereview as useful. We further downloaded information on the530,609 users who reviewed and rated recipes. This informa-tion includes links to users’ pages, their interests, the datethey joined the site, their cooking experience level, hometown, and current city they live in.

3.1 Data preprocessingThe first step in processing the recipes is identifying the

ingredients listed. Matching on predefined lists of ingre-dients often missed or misidentified ingredients commonlysupplied by users. We therefore derived the list of ingredi-ents from the recipes themselves through the following pro-cedure. We removed quantifiers, such as e.g. “1 lb” or “2cups”, words referring to consistency or temperature, e.g.chopped or cold, along with a few other heuristics, such asremoving content in parentheses. For example “1 (28 ounce)can baked beans (such as Bush’s Original R©)” is identified as“baked beans”. We erred on the side of not conflating poten-tially identical or highly similar ingredients, e.g. “cheddarcheese”, used in 2450 recipes, was considered different from“sharp cheddar cheese”, occurring in 394 recipes.

We then generated an ingredient list sorted by frequencyof ingredient occurrence and selected the top 1000 commoningredient names as our finalized ingredient list. Each of thetop 1000 ingredients occurred in 23 or more recipes, withplain salt making an appearance in 21,916 recipes. Theseingredients also accounted for 94.9% of ingredient entries inthe recipe dataset. The remaining ingredients were missedeither because of high specificity (e.g. yolk-free egg noodle),referencing brand names (e.g. Planters almonds), rarity (e.g.serviceberry), misspellings, or not being a food (e.g. “nylonnetting”).

The remaining processing task involved identifying cook-ing processes from the directions. We first identified all heat-ing methods using a listing in the Wikipedia entry on cook-ing [18]. For example, baking, boiling, and steaming areall ways of heating the food. We then identified mechanicalways of processing the food such as chopping and grinding,

and other chemical techniques such as marinating and brin-ing.

4. DESCRIPTIVE ANALYSISOne of the interesting aspects about this dataset is that

it allows us to obtain a large-scale view of cooking methods.Here we discuss how different cooking methods vary withuser and regional preference.

4.1 Why we cookIt has been suggested that cooking played a significant role

in human evolution by allowing us to extract more energyvalue from food [19]. An experiment measuring energy ex-pended by the Burmese python in digesting meat has shownthat processing the meat by grinding and cooking individ-ually reduces the digestive cost to the snake, and combin-ing both processing methods reduces energy cost more thaneach method individually [1]. Interestingly, it appears thataverage recipe ratings correlate with the ability of the pro-cessing method to reduce digestive cost. Table 4.1 showsthat recipes that call for cooking food have higher ratingsthan ones that merely break it down mechanically, which inturn are rated more highly than ones that simply “mix” or“toss” ingredients together. Furthermore, we observe thatrecipes with additional chemical processing, e.g. ferment-ing and marinating, tend to receive higher ratings than onespreparing the food with only heating and mechanical meth-ods. However, perhaps due to the additional time and plan-ning they require, they occur in only about 8% of the recipesin the dataset.

Table 1: Occurrence and average ratings of cookingmethods

occurrence average ratingMechanical methods 34759 3.60

Heating methods 40238 4.11Chemical methods 3686 4.14

The preference for multiple food processing methods mightat first be interpreted as a reflection of the sophisticationof the recipe, with more complex recipes rated more highly.However, in general we find no correlation between the num-ber of steps or the number of ingredients and the averagerating a recipe receives, making it more likely that the di-gestibility of the prepared food is a factor in how highlyrated it is.

4.2 Regional preferencesWhile cooking methods that make food more digestible

tend to be preferred, choosing one method over anotherappears to be a question of regional taste. About 5.8%(n=2693) of recipes were classified into one of 5 US regions:Midwest, Northeast, South, West Coast (including Alaskaand Hawaii), and Mountain. Figure 1 shows significantly(χ2 test p-value < 0.001) varying preferences in the differ-ent US regions among 6 of the most popular cooking meth-ods. Boiling and simmering, both involving heating food inhot liquids, are more common in the South and Midwest.Marinating and grilling are relatively more popular in theWest and Mountain regions, but in the West more grillingrecipes involve seafood (18/42 = 42%) relative to other re-

% in recipes

method

bake

fry

roast

grill

marinate

simmer

boil

0 10 20 30 40

west-coastsouth

northeastmountainmidwest

Figure 1: The percentage of recipes by region thatapply a specific cooking recipe.

gions combined (7/106 = 6%). Frying is popular in theSouth and Northeast. Baking is a universally popular andversatile technique, which is often used for both sweet andsavory dishes, and is slightly more popular in the Northeastand Midwest. Examination of individual recipes reflectingthese frequencies shows that these differences can be tiedto differences in demographics, immigrant culture and avail-ability of local ingredients, e.g. seafood.

5. INGREDIENT COMPLEMENT NETWORKCan we learn how to combine ingredients from the data?

Here we employ the occurrences of ingredients across recipesto learn users’ knowledge about combinations of ingredients.

We constructed an ingredient complement network basedon pointwise mutual information (PMI) defined on pairs ofingredients (a, b):

pmi(a; b) = logp(a, b)

p(a)p(b),

where

p(a, b) =# of recipes containing a and b

# of recipes,

p(a) =# of recipes containing a

# of recipes,

p(b) =# of recipes containing b

# of recipes.

The PMI gives the probability that two ingredients occurtogether against the probability that they occur separately.Complementary ingredients tend to occur together far moreoften than would be expected by chance.

Figure 2 shows a visualization of ingredient complemen-tarity. Two distinct subcommunities of recipes are imme-diately apparent: one corresponding to savory dishes, theother to sweet ones. Some central ingredients, e.g. egg andsalt, actually are pushed to the periphery of the network.They are so ubiquitous, that although they have many edges,they are all weak, since they don’t show particular comple-mentarity with any single group of ingredients.

We further probed the structure of the complementaritynetwork by applying a network clustering algorithm. Thealgorithm confirmed the existence of two main clusters con-taining the vast majority of the ingredients. An interestingsatellite cluster is that of ingredients for mixed drinks, which

cherry gelatin

graham cracker

low fat cottage cheese

pork shoulder roast

heavy whipping cream tofu

bok choy

butter cracker

baking soda

pimento pepper

milk powder

chorizo sausage

lady�nger

steak sauce

crimushroom

radishe

shiitake mushroom

pesto

brownie mix

pumpkin pie spice

rye �our

cardamom

sa�ron thread

linguine

corn

fat free sour cream

basmati rice

bittersweet chocolate

bay

corn chip

cracker

french green bean

poppy seed

vegetable oil

grape tomato

pizza crust dough

low sodium beef broth

club soda

lard

soy saucepanko bread

couscou

crab meat

mango

unpastry shell

catalina dressing

pasta shell

italian salad dressing

mexican corn

decorating gel

italian bread

napa cabbage

onion powder

white wine vinegar

cocktail rye bread

basil sauce

crouton

brown gravy mix

barbeque sauce

apple cider vinegar

hoagie roll

milk chocolate candy kisse

�ounder

salt black pepper

maraschino cherry juice

chow mein noodle

tiger prawn

banana pepper

cranberry

vermicelli pasta

root beer

strawberry jam

lemon gelatin mix

creamed corn

pretzel

pie shell

sun�ower kernel

rump roast

romaine

vegetable stock

lemon pepper seasoning

guacamole

louisiana hot sauce

cabbage

yellow onion

super�ne sugar

orange peel

raspberry

cumin seedcandied mixed fruit peel

cream of coconut

bow tie pasta

creme fraiche

currant

pork chop

turkey gravy

fat free half and half

chicken ramen noodle

wooden skewer

whipping cream

mace

seasoning salt

mozzarella cheesepasta sauce

lean pork

broccoli �oweret

tomatillo

lemonade

tomato paste

caesar dressing

basil pesto

melon liqueur

coconut milk

whole wheat pastry �our

muenster cheese

lump crab meat

angel food cake

ring

cheese tortellini

spiral pasta

vanilla pudding

cauli�oweret

smoked sausage

hot dog

pita bread

cocoa powder

garbanzo bean

tart apple

wheat bran

hot pepper sauce

chili

refried bean

salmon steak

white cheddar cheese

low fat mayonnaise

grapefruit

dijon mustard

tomato juice

yellow squash

baking apple

cream of tartar

vodka

rye bread

white chip

�at iron steak

linguine pasta

fennel

whole wheat bread

baking mix

alfredo pasta sauce

margarine

confectioners' sugarfruit gelatin mix

pork

balsamic vinegar

pork loin chop

jicama

pre pizza crust

triple sec

teriyaki sauce

cola carbonated beverage

polish sausage

cracked black pepper

poblano chile pepper

individually wrapped caramel

roast beef

bread stu�ng mix

eggnog

pear

caramel

beet

worcestershire sauce

chicken stock

horseradish

semisweet chocolate chip

basil

red grape

plum

cinnamon sugar

fajita seasoning

rice noodle

powdered milk

star anise pod

short grain rice

ramen noodle

vegetable

coconut oil

whiskey

lime gelatin mix

peanut oil

ham

ginger root

lima bean

pimento stu�ed green olive

hoisin sauce

round steak

stu�ng

part skim ricotta cheese

broiler fryer chicken up

milk chocolate chip

turbinado sugar

vegetable shortening

tarragon vinegar

golden delicious apple

turkey

rigatoni pasta

stu�ng mix

milk

juiced

burgundy wine

red kidney bean

dill

candied pineapple

german chocolate cake mix

arborio rice

sugar free vanilla pudding mix

pine nut

green apple

cucumber oreganopearl onion

stu�ed green olive

whipped topping mix

broccoli

pinto bean

pasta

beef short rib

gelatin

garlic powder

rutabaga

chicken liver

pepperjack cheese

herb

lemon gras

sweet potato

pineapple ring

parsley �ake

pie �lling

spice cake mix

butterscotch chip

greek yogurt

vanilla ice cream

seafood seasoning

parsnip

applesauce

chinese �ve spice powder

salt pepper

beef broth

cherry tomato

sage

vanilla

vital wheat gluten

artichoke heart

mixed berry

bacon dripping

self rising �our

nilla wafer

navy bean

bacon

egg yolk

wonton wrapper

chocolate pudding mix

salsa

coconut

tomato based chili sauce

marsala wine

mussel

manicotti shell

anise extract

mustard seed

nutmeg

cayenne pepper

black bean pepperokra

asparagu

mustard powder

�rmly brown sugar

balsamic vinaigrette dressing

chicken breastoyster

ditalini pasta

old bay seasoning tm

brown rice

process american cheese

chocolate

miso paste

pineapple

iceberg lettuce

pearl barley

oat

greek seasoning

biscuit

clove

browning sauce

chicken bouillon powder

green pea

bread dough

cream cheesepeanut butter chip

silken tofu

pineapple chip

sea scallop

ricotta cheese

papaya

red cabbage

egg substitute

zesty italian dressing

devil's food cake mix

bagel

sour mix

lamb

irish stout beer

sea salt

romaine lettuce

kalamata olive

salt

monosodium glutamate

rice wine

white potato

rum extract

grape jelly

crescent roll dough

beer

phyllo dough

fettuccine pasta

chili seasoning mix

biscuit mix

candy coated chocolate

green cabbage

ranch bean

cream of celery soup

apple pie �lling

caper

nectarine

white mushroom

banana

orange gelatin mix

1% buttermilk

apple jelly

dinner roll

sugar pumpkin

salad green

shrimp

cheese ravioli

chicken wing

sour cream

saltine

cornmeal

mixed vegetable

beef tenderloin

sherry

rotini pasta

mexican cheese blend

kosher salt black pepper

mayonnaise

lobster

white onion

chocolate cookie

white bread

french baguette

bread

vanilla frosting

anise seed

ranch dressing mix

wild rice

hot

canadian bacon

corn�akes cereal

wax bean

cantaloupe

non fat yogurt

lite whipped topping

spaghetti squash

egg roll wrapper

solid pack pumpkin

recipe pastry

asafoetida powder

co�ee powder

italian sauce

amaretto liqueur

shortening

turmeric

semolina �our

pomegranate juice

corned beef

skewer

shallotspanish onion

tapioca

provolone cheese

chile sauce

vanilla bean

chile pepperangel hair pasta

pumpkin

tilapia

brie cheese

cottage cheese

banana liqueur

lemon

smoked salmon

ginger paste

brown mustard

peanut butter

escarole

sour milk

olive oil

country pork rib

pastry shell

adobo seasoning

candy coated milk chocolate

curryghee

alfredo sauce

yellow cake mix

granny smith apple

beef chuck

chocolate hazelnut spread

maple syrup

squid

gingersnap cooky

raspberry gelatin

molasse

lemon cake mix

�sh stock

cook

grenadine syrup

pu� pastry

rum

grapefruit juice

tahini

black pepperbutternut squash

key lime juice

sirloin steak

macaroni

butter shortening

brown lentil

chicken broth

chili bean

pickling spice

yellow food coloring

great northern bean

mixed nut

green chile

salmon

english mu�n

co�ee liqueur

non fat milk powder

buttermilk

distilled white vinegar

golden syrup

powdered fruit pectin

green chily

grape

raspberry gelatin mix

low fat sour cream

topping

pineapple juice

red lettuce

orange zest

ketchup

chunk chicken

steak seasoning

sandwich roll

crystallized ginger

kosher salt

roma tomato

red bean

red candied cherry

sesame seed

beef stock

cashew

popped popcorn

apricot nectar

any fruit jam

processed cheese food

red pepper

coleslaw mix

white cake mix

cherry pie �lling

canola oil

whole wheat �our

honey

long grain

marinara sauce

yellow summer squash

to�ee baking bit

whole milk

trout

onion separated

low fat cream cheese

corn oil

oat bran

cream of potato soup

allspice berry

mandarin orange

cumin

saltine cracker

swiss chard

fenugreek seed

�sh sauce

eggplant

baby corn

cider vinegar

orange sherbet

debearded

beef bouillon

kernel corn

vanilla vodka

chicken leg quarter

mintfeta cheese

lime juice

raspberry jam

cooking oil

white corn

herb stu�ng mix

lemon lime soda

pork sausage

ziti pasta

orange marmalade

yogurt

bean

ginger garlic paste

crescent dinner roll

scallop

walnut oil

smoked ham

red food coloring

triple sec liqueur

fat free evaporated milk

walnutbaking chocolate

blueberry

caramel ice cream topping

bacon grease

fat free italian dressing

steak

�g

miracle whip ‚Ñ

potato starch

luncheon meat

brandy based orange liqueur

smoked paprika

pu� pastry shell

raspberry preserve

apple butter

tomato sauce

white rice

beef stew meat

taco seasoning mix

date

whipped topping

marshmallow

co�ee

butterscotch schnapp

red wine vinegar

orange

chicken thigh

mild italian sausage

blueberry pie �lling

yeast

lime peel

rice �our

chocolate cake mix

barbecue sauce

monterey jack cheese

halibut

beef round steak

seed

sour cherry

pork sparerib

orange roughy

barley nugget cereal

leek

maraschino cherry

chickpea

fettuccini pasta

orange juice

blue cheese dressing

yam

garam masala

black eyed pea

penne pasta

serrano chile pepper

�ourchive

marjoram

herb stu�ng

beef sirloin

beef

maple extract

bamboo shoot

lemon extract

meat tenderizer

kielbasa sausage

low sodium chicken broth

asparagus

cod

italian seasoning

lime gelatin

vegetable bouillon

andouille sausage

collard green

blackberry

beef gravy

green grape

tamari

fruit

malt vinegar

strawberry gelatin

lemon gelatin

green olive

poultry seasoning

prune

beef consomme

chili powder

dressing

fennel seed

gruyere cheese

jellied cranberry sauce

chipotle pepper

vanilla extract

apricot

linguini pasta

cranberry sauce

port wine

process cheese

cornish game hen

cilantro

green chile pepper

wheat

bread machine yeast

tube pasta

biscuit baking mix

cream corn

spinach

low fat whipped topping

irish cream liqueur

candy

zucchini

mild cheddar cheese

orange gelatin cornstarch

cheese

snow pea

low fat margarine

green candied cherry

vermouth

brandy

white grape juice

corn bread mix

broccoli �oret

vidalia onion

cocktail sauce

pickled jalapeno pepper

beaten egg

hamburger bun

black walnut

dill pickle juice

dill pickle relish

habanero pepper

white chocolate chip

veal

powdered non dairy creamer

lasagna noodle

gingerapricot jam

imitation crab meat

chicken soup base

white bean

tarragon

onion soup mix

thousand island dressing

red lentil

pancake mix

wheat germ

fat free mayonnaise

yukon gold potato

long grain rice

carrot

cauli�ower �oret

vegetable cooking spray

craw�sh tail

peppermint extract

brussels sprout

onion salt buttermilk biscuit

white kidney bean

mango chutney

black olive

meatless spaghetti sauce

curry powder

coriander

red snapper

biscuit dough

sausage

cheddar cheese soup

lettuce

pork loin roast

lemon pepper

red curry paste

egg noodle

hot sauce

raspberry vinegar

butter cooking spray

peach schnapp

eggspicy pork sausage

mixed fruit

cat�sh

venison

yellow pepper

carbonated water

pumpkin seed

new potato

lemon juice

chocolate pudding

watermelon

chicken breast half

gorgonzola cheese

buttery round cracker

apple pie spice

process cheese sauce

jasmine rice

lemon pudding mix

cooking sherry

strawberry preserve

french bread

toothpick

sauce

corn tortilla chip

garlic paste

salt free seasoning blend

elbow macaronipickle

cream of chicken soup

cardamom pod

persimmon pulp

chicken

liquid smoke

cocoa

pound cake

bell pepper

food coloring

coconut extract

chocolate chip

berry cranberry sauce

red bell pepper

seashell pasta

american cheese

oatmeal

sourdough bread

cornbread

mixed salad green

arugula

oil

parmesan cheeseclam juice

brick cream cheesecereal

italian parsley

milk chocolate

rice wine vinegar

hot dog bun

pistachio pudding mix

curd cottage cheese

garlic salt

chocolate cookie crust

orange extract

cream of mushroom soup

sa�ron

mushroom

tortilla chip

white hominy

green beans snapped

dill pickle

french onion soup

skim milk

tequila

�ax seed

low fat cheddar cheese

red wine

nut

apple cider

candied cherry

cheddar cheese

gingerroot

chocolate frosting

low fat yogurt

peppercorn

pepperoni

artichoke

baby pea

crisp rice cereal

potato chip

coconut cream

angel food cake mix

onion �ake

salad shrimp

taco seasoning

champagne

peach

low fat

yellow cornmeal

pork roast

baby spinach

portobello mushroom cap

blue cheese

strawberry gelatin mix

pink lemonade

chestnut

strawberry

oyster sauce

sugar snap pea

ka�r lime

anchovy

stu�ed olive

herb bread stu�ng mix

half and half

serrano pepper

coconut rum

red apple

cherry

�ank steak

round

peppermint candy

butter bean

almond

white vinegarcelery seed

corn syrupfat free cream cheese

cannellini bean

clam

mustard

scallion

potato �ake

parsley

fat free yogurt

pita bread round

red pepper �ake

onion

bourbon whiskey

creme de menthe liqueur

golden raisin

pancetta bacon

apple juice

egg white

fontina cheese

kale

asiago cheese

spiced rum

farfalle pasta

lobster tail

mirin

leg of lamb

tomato

zested

sauerkraut

unpie crust

bourbon

lean beef

tuna steak

wild rice mix

raisin

chocolate syrup

juice

cajun seasoning

cauli�owerwaterlemon yogurt

tapioca �our

vanilla yogurt

pimiento

hazelnut liqueur

thyme

part skim mozzarella cheese

mandarin orange segment

cinnamon

corn tortilla

crispy rice cereal

colby monterey jack cheese

apricot preserve

chipotle chile powder

swiss cheese

white wine

baking powdergraham cracker crust

vanilla wafer

lime

sugar based curing mixture

cream cheese spread

celeryolive

simple syrup

asian sesame oil

bacon bit

sharp cheddar cheese

rice vinegar

sea salt black pepper

curry paste

beef chuck roast

butter extract

pork loin

ginger ale

chicken leg

adobo sauce

lime zest

ham hock

watercres

pastry

seasoning

lentil

mascarpone cheese

baker's semisweet chocolate

acorn squash

chunk chicken breast

pepperoni sausage

brown sugar

fusilli pasta

kaiser roll

red delicious apple

honey mustard

unbleached �our

vinegar

spicy brown mustard

chuck roast

candied citron

vegetable combination

beef �ank steak

red chile pepper

avocado

quinoa

cake �our

whole wheat tortilla

dill seed

turnip

vegetable broth

sugarsugar cookie mix neufchatel cheese

coriander seed

apple

vegetable soup mix

chocolate sandwich cooky

colby cheese

sourdough starter

green bean

pecansoftened butter matzo meal

hash brown potato

vanilla pudding mix

pickle relish

noodlered potato

white chocolate

pistachio nut

green food coloring

lemon zest

chutney

splenda

buttermilk baking mix

caraway seedmaple �avoring

taco sauce

chili oil

kiwi

lean turkey

garlic

golden mushroom soup

grit

chili sauce

rosemary

green salsa

corkscrew shaped pasta

marshmallow creme

enchilada sauce

baby carrot

savory

cinnamon red candy

corn mu�n mix

black peppercorn

green bell pepperwater chestnut

french dressing

almond extract

rose water

paprika

english cucumber

nutritional yeast

unpie shell

ears corn

cream of shrimp soup

plum tomato

bratwurst

green lettuce

lemon lime carbonated beverage

ice

creole seasoning

grape juice

italian sausage

pizza crust

orzo pasta

white rum

crescent roll

italian cheese blend

rhubarb

chicken bouillon

prosciutto

cream

red onion

marinated artichoke heart

jalapeno chile pepper

tater tot

pork tenderloin

spaghetti

gin

semisweet chocolate

pie crust

cooking spray

spaghetti sauce

bread �our

butterscotch pudding mix

romano cheese

bulgur

hungarian paprika

white balsamic vinegar

picante sauce

meatball

tuna

chili without bean

bean sprout

baking cocoa

chile paste

butter

yellow mustard

haddock

sun�ower seed

processed american cheese

russet potato

allspice

giblet

button mushroom

peanut

kidney bean

portobello mushroom

ranch dressing

almond paste

hazelnut

beef brisket

sake

fruit cocktail

beef sirloin steak

pimento

honeydew melon

low fat milk

salami

german chocolate

pizza sauce

green tomato

orange liqueur

celery salt

chocolate mix

cranberry juice

white pepper

barley

soy milk

sweet

poblano pepper

macadamia nut

goat cheese

tomato soup

tea bag

mixed spice

low fat peanut butter

turkey breast

lemon peel

tomato vegetable juice cocktail

jalapeno pepper

low sodium soy sauce

processed cheese

limeade

arti�cial sweetener

sesame oil

heavy cream

fat free chicken broth

pork shoulder

evaporated milk

corn�ake

bay scallop

chocolate waferwhite sugar

rapid rise yeast potato

�our tortilla

chicken drum

chocolate ice cream

pepper jack cheese

baking potato

italian dressing mix

Figure 2: Ingredient complement network. Two ingredients share an edge if they occur together more thanwould be expected by chance and if their pointwise mutual information exceeds a threshold.

evident as a constellation of small nodes located near thetop of the sweet cluster in the visualization of Figure 2. Thecluster includes the following ingredients: lime, rum, ice,orange, pineapple juice, vodka, cranberry juice, lemonade,tequila, etc.

For each recipe we examined the minimum, average, andmaximum pairwise pointwise mutual information betweeningredients. The intuition is that complementary ingredi-ents would yield higher ratings, while ingredients that don’tgo together would lower the average rating. We found thatwhile the average and minimum pointwise mutual informa-tion between ingredients is uncorrelated with ratings, themaximum is very slightly positively correlated with the av-erage rating for the recipe (ρ = 0.09, p-value < 10−10). Thissuggests that having at least two complementary ingredientsvery slightly boosts a recipe’s prospects, but having clashingor unrelated ingredients does not seem to do harm.

6. RECIPE MODIFICATIONSCo-occurrence of ingredients aggregated over individual

recipes reveals the structure of cooking, but tells us littleabout how flexible the ingredient proportions are, or whethersome ingredients could easily be left out or substituted. Anexperienced cook may know that apple sauce is a low-fat al-ternative to oil, or may know that nutmeg is often optional,but a novice cook may implement recipes literally, afraidthat deviating from the instructions may produce poor re-sults. While a traditional hardcopy cookbook would providefew such hints, they are plentiful in the reviews submittedby users who implemented the recipes, e.g. “This is a greatrecipe, but using fresh tomatoes only adds a few minutes tothe prep time and makes it taste so much better”, or anothercomment about the same salsa recipe “This is by far the bestrecipe we have ever come across. We did however change itjust a little bit by adding extra onion.”

As the examples illustrate, modifications are reported evenwhen the user likes the recipe. In fact, we found that 60.1%of recipe reviews contain words signaling modification, suchas “add”,“omit”,“instead”,“extra” and 14 others. Further-

1 2 3 4 5

rating

prop

ortio

n of

revi

ews

with

giv

en ra

ting

0.0

0.1

0.2

0.3

0.4

0.5

0.6

no modificationwith modification

Figure 3: The modifiability of ingredients. The linerepresents equal number of occurrences where thereviews suggested to increase as opposed to increasethe amount of the ingredient in the dish.

more, it is the reviews that include changes that have astatistically higher average rating (4.49 vs. 4.39, t-test p-value < 10−10), and lower rating variance (0.82 vs. 1.05,Bartlett test p-value < 10−10), as is evident in the distribu-tion of ratings, shown in Fig. 3. This suggests that flexibilityin recipes is not necessarily a bad thing, and that reviewerswho don’t mention modifications are more likely to think ofthe recipe as perfect, or to dislike it entirely.

In the following, we describe the recipe modifications ex-tracted from user reviews, including adjustment, deletionand addition. We then present how we constructed an in-gredient substitute network based on the extracted informa-tion.

6.1 AdjustmentsSome modifications involve increasing or decreasing the

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.01

0.02

0.05

0.10

0.20

0.50

1.00

(# reviews adjusting down)/(# recipes)

(# re

view

s ad

just

ing

up)/(

# re

cipe

s)

salt

butter

egg

flour

white sugarwateronion

garlic

milk

vanilla extract

pepper olive oil

vegetable oil

brown sugar

black peppersugar

cinnamon

tomato

margarine

baking powderbaking soda

lemon juice

parsley

cs’. sugar

parmesan

celery

cream cheese

green bell pepper

carrot

walnut

cheddar

sour cream

garlic powderchicken breast

nutmegbasil

pecan

mushroom

mayonnaise

chicken broth

potato

soy sauce

oregano

cornstarch

shortening

honeychocolate chipbacon

worcestershire s.

Figure 4: Modifications to the 50 most common in-gredients, derived from recipe reviews. The linedenotes equal numbers of suggested increases anddecreasess.

amount of an ingredient in the recipe. In this and the fol-lowing analyses, we split the review on punctuation suchas commas and periods. We used simple heuristics to de-tect when a review suggested a modification: adding/usingmore/less of an ingredient counted as an increase/decrease.Doubling or increasing counted as an increase, while reduc-ing, cutting, or decreasing counted as a decrease. While it islikely that there are other expressions signaling the adjust-ment of ingredient quantities, using this set of terms allowedus to compare the relative rate of modification, as well asthe frequency of increase vs. decrease between ingredients.The ingredients themselves were extracted by performing amaximal character match within a window following an ad-justment term.

Figure 4 shows the ratios of the number of reviews sug-gesting modifications, either increases or decreases, to thenumber of recipes that contain the ingredient. Two patternsare immediately apparent. Ingredients that may be per-ceived as being unhealthy, such as fats and sugars, are, withthe exception of vegetable oil and margarine, more likelyto be modified, and to be decreased. On the other hand,flavor enhancers such as soy sauce, lemon juice, cinnamon,Worcestershire sauce, and toppings such as cheeses, baconand mushrooms, are also likely to be modified; however, theytend to be added in greater, rather than lesser quantities.Combined, the patterns suggest that good-tasting but “un-healthy” ingredients can be reduced, if desired, while spices,extracts, and toppings can be increased to taste.

6.2 Deletions and additionsRecipes are also frequently modified such that ingredients

are omitted entirely. We looked for words indicating thatthe reviewer did not have an ingredient (and hence did notuse it), e.g. “had no” and “didn’t have”. We further used“omit/left out/left off/bother with” as indication that thereviewer had omitted the ingredients, potentially for otherreasons. Because reviewers often used simplified terms, e.g.“vanilla” instead of “vanilla extract”, we compared words inproximity to the action words by constructing 4-character-

grams and calculating the cosine similarity between the n-grams in the review and the list of ingredients for the recipe.

To identify additions, we simply looked for the word“add”,but omitted possible substitutions. For example, we woulduse “added cucumber”, but not “added cucumber instead ofgreen pepper”, the latter of which we analyze in the follow-ing section. We then compared the addition to the list ofingredients in the recipe, consider the addition valid only ifthe ingredient does not already belong in the recipe.

Table 6.2 shows the correlation of the ingredient modifica-tions. As might be expected, the more frequently an ingre-dient occurs in a recipe, the more times its quantity has theopportunity to be modified, as is evident in the strong corre-lation between the recipe frequency and both increases anddecreases recommended in reviews. However, if we take theproportion of modifications to the number of recipes the in-gredient appears in, these are typically negatively correlatedwith the frequency of the ingredient, e.g. deletions/recipewith ρ = −0.22, additions ρ = −0.25, increases ρ = −0.26.For example, salt is so essential, appearing in over 21,000recipes, that we detected only 18 reviews where it was ex-plicitly dropped. In contrast, Worcheshire sauce, appearingin 1,542 recipes, is dropped explicitly in 148 reviews.

As might also be expected, additions are positively corre-lated with increases, and deletions with decreases. However,additions and deletions are very weakly negatively corre-lated, indicating that an ingredient that is added frequentlyis not necessarily omitted more frequently as well.

Table 2: Correlations between ingredient modifica-tions

addition deletion increase decreaserecipes 0.41 0.22 0.61 0.68

addition -0.15 0.79 0.11deletion 0.09 0.58increase 0.39

6.3 Ingredient substitute networkReplacement relationships show whether one ingredient

is preferable to another. The preference could be basedon taste, availability, or price. Some ingredient substitu-tion tables can be found online 1, but are neither extensivenor contain information about relative frequencies of eachsubstitutuion. Thus, we found an alternative source for ex-tracting replacement relationships – users’ comments, e.g.“I replaced the butter in the frosting by sour cream, just tosoothe my conscience about all the fatty calories”.

To extract such knowledge, we first parsed the reviewsas follows: we considered several phrases to signal replace-ment relationships: “replace a with b”, “substitute a withb”, “a instead of b”, etc, and matched a and b to our list ofingredients.

We constructed an ingredient substitute network to cap-ture users’ knowledge about ingredient replacement. Thisweighted, directed network consists of ingredients as nodes.We thresholded and eliminated any suggested substitutionsthat occurred fewer than 5 times. We then determined theweight of each edge by p(b|a), the proportion of substitu-

1e.g., http://allrecipes.com/HowTo/common-ingredient-substitutions/detail.aspx

Figure 5: The network of ingredient substitution.Nodes are sized according to the number of timesthey have been recommended as a substitute for an-other ingredient in reviews, and colored according totheir indegree.

tions of ingredient a that suggest ingredient b. For example,68% of substitutions for white sugar were to splenda, anartificial sweetener, and hence the assigned weight for thesugar → splenda edge is 0.68. The resulting network isshown in Figure 5.

The substitution network shown in Fig. 5 exhibits strongclustering. We examined this structure by applying the mapgenerator tool by Rosvall et al. [13], which uses a randomwalk approach to identify clusters in weighted, directed net-works. The resulting clusters, and their relationships to oneanother, are shown in Fig. 6. The derived clusters could beused when following a relatively new recipe which may nothave many reviews, and therefore many suggestions for in-gredient substitutions. If one does not have all ingredientsat hand, one could examine the content of one’s fridge andpanty and match it with other ingredients found in the samecluster as the ingredient called for by the recipe. Table 6.3lists the contents of a few such sample ingredient clusters,and Fig. 7 shows two example clusters extracted from thesubstitute network.

Finally, we examined whether the substitution networkencodes preferences for one ingredient over another, as evi-denced by the relative ratings of similar recipes, one whichcontains an original ingredient, and another which imple-ments a substitution. To test this hypothesis, we constructa “preference network”, where one ingredient is preferred toanother in terms of received ratings, and is constructed bycreating an edge (a, b) between a pair of ingredients, wherea and b are listed in two recipes X and Y respectively, ifrecipe ratings RX > RY . For example, if recipe X includesbeef, ketchup and cheese, and recipe Y contains beef andpickles, then this recipe pair contributes to two edges: onefrom pickles to ketchup, and the other from pickles to cheese.The aggregate edge weights are defined based on PMI. Be-cause PMI is a symmetric quantity (pmi(a; b) = pmi(b; a)),

Table 3: Clusters of ingredients that can be substi-tuted for one another. A maximum of 5 additionalingredients for each cluster are listed, ordered byPageRank.

main other ingredients

chicken turkey, beef, sausage, chicken breast, baconolive oil butter, apple sauce, oil, banana, margarine

sweet yam, potato, pumpkin, butternut squash,potato parsnipbaking baking soda, cream of tartarpowderalmond pecan, walnut, cashew, peanut, sunflower s.

apple peach, pineapple, pear, mango, pie fillingegg egg white, egg substitute, egg yolk

tilapia cod, catfish, flounder, halibut, orange roughyspinach mushroom, broccoli, kale, carrot, zucchiniitalian basil, cilantro, oregano, parsley, dill

seasoningcabbage coleslaw mix, sauerkraut, bok choy

napa cabbage

chicken,..

tilapia,..

italian seasoning,..seasoning,..

onion,..

garlic,..chicken broth,..

milk,..

sour cream,..

honey,..

olive oil,..

spinach,..

bread,..apple,..

sweet potato,..

cinnamon,..

black bean,..

flour,..

tomato,..

sauce,..

lemon juice,..

pepper,..brown rice,..

white wine,..

strawberry,..

spaghetti sauce,..

almond extract,..vanilla,..

cheese,..

almond,..

chocolate chip,..

baking powder,..

cream of mushroom soup,..

egg,..

cranberry,..pie crust,..

cabbage,..

celery,..

champagne,..

coconut milk,..

corn chip,..

sea scallop,..

apple juice,..

hoagie roll,..

iceberg lettuce,..

cottage cheese,..

golden syrup,.. black olive,..

pickle,..

red potato,..

quinoa,..

graham cracker,..

lemon cake mix,..

imitation crab meat,..

peach schnapp,..

hot,..

vegetable shortening,..pumpkin seed,..

lemonade,..

curry powder,..

dijon mustard,..

sugar snap pea,..

smoked paprika,..

Figure 6: Ingredient substitution clusters. Nodesrepresent clusters and edges indicate the presence ofrecommended substitutions that span clusters. Eachcluster represents a set of related ingredients whichare frequently substituted for one another

we introduce a directed PMI measure to cope with the di-rectionality of the preference network:

pmi(a→ b) = logp(a→ b)

p(a)p(b),

where

p(a→ b) =# of recipe pairs from a to b

# of recipe pairs,

and p(a), p(b) are defined as in the previous section.Comparing the substitution network with this preference

network, we found high correlations between the two net-

milkheavy whipping cream

whole milk

skim milk

whipping cream

heavy cream

buttermilk

soy milk

half and half

evaporated milk

cream

cinnamon

ginger

pumpkin pie spicecardamom

nutmeg allspice

ginger root

clove

mace

(a) milk substitutes (b) cinammon substitutes

Figure 7: Relationships between ingredients locatedwithin two of the clusters from Fig. 6.

works (ρ = 0.72, p < 0.001). This observation suggests thatthe substitute network encodes users’ ingredient preference,which we will use in the recipe prediction task described inthe next section.

7. RECIPE RECOMMENDATIONWe use the above insights to uncover novel recommen-

dation algorithms suitable for recipe recommendations. Weuse ingredients and the relationships encoded between themin ingredient networks as our main feature sets to predictrecipe ratings, and compare them against features encod-ing nutrition information, as well as other baseline featuressuch as cooking methods, and preparation and cook time.Then we apply a discriminative machine learning method,stochastic gradient boosting tree [7], to predict recipe rat-ings.

In the experiments, we seek to answer three questions use-ful for recipe recommendation: (1) Can we predict users’preference for a new recipe given the information present inthe recipe? (2) What are the key aspects that determineusers’ preference? (3) Does the structure of ingredient net-works help in recipe recommendation, and how?

We shall answer these questions through a prediction task.

7.1 Recipe Pair PredictionThe goal of our prediction task is: given a pair of similar

recipes, determine which one has higher average rating thanthe other. This task is designed particularly to help userswith a specific dish or meal in mind, and who are trying todecide between several recipe options for that dish.

Recipe pair data. The data for this prediction taskconsists of pairs of similar recipes. The reason for select-ing similar recipes, with high ingredient overlap, is thatwhile apples may be quite comparable to oranges in thecontext of recipes, especially if one is evaluating salads ordesserts, lasagna may not be comparable to a mixed drink.To derive pairs of related recipes, we computed similaritywith a cosine similarity between the ingredient lists for thetwo recipes, weighted by the inverse document frequency,log(# of recipes/# of recipes containing the ingredient).We considered only those pairs of recipes whose cosine sim-ilarity exceeded 0.2. The weighting is intended to identifyhigher similarity among recipes sharing more distinguishingingredients, such as Brussels sprouts, as opposed to recipessharing very common ones, such as butter.

A further challenge to obtaining reliable relative rankingsof recipes is variance introduced by having different userschoose to rate different recipes. In addition, some usersmight not have a sufficient number of reviews under their

belt to have calibrated their own rating scheme. To con-trol for variation introduced by users, we examined recipepairs where the same users are rating both recipes and arecollectively expressing a preference for one recipe over an-other. Specifically, we generated 62,031 recipe pairs (a, b)where (ratingi(a) > ratingi(b), for at least 10 users i, andover 50% of users who rated both recipe a and recipe b. Fur-thermore, each user i should be an active enough reviewerto have rated at least 8 other recipes.

Features. In the prediction dataset, each observationconsists of a set of predictor variables or features that rep-resent information about two recipes, and the response vari-able is a binary indicator of which gets the higher rating onaverage. To study the key aspects of recipe information, weconstructed different set of features, including:

• Baseline: This includes cooking methods, such as chop-ping, marinating, or grilling, and cooking effort de-scriptors, such as preparation time in minutes, as wellas the number of servings produced, etc. These fea-tures are considered as primary information about arecipe and will be included in all other feature setsdescribed below.

• Full ingredients: We selected up to 1000 popular ingre-dients to build a “full ingredient list”. In this featureset, each observed recipe pair contains a vector withentries indicating whether an ingredient in the full listis present in either recipe in the pair.

• Nutrition: This feature set does not include any in-gredients but only nutrition information such the totalcaloric content, as well as amount of fats, carbohy-drates, etc.

• Ingredient networks: In this set, we replaced the fullingredient list by structural information extracted fromdifferent ingredient networks, as described in section 5and 6.3. Co-occurrence is treated separately as a rawcount, and a complementarity, captured by the PMI.

• Combined set: Finally, a combined feature set is con-structed to test the performance of a combination offeatures, including baseline, nutrition and ingredientnetworks.

To build the ingredient network feature set, we extractedthe following two types of structural information from theco-occurrence and substitution networks, as well as the com-plement network derived from the co-occurrence informa-tion:

Network positions are calculated to represent how a recipe’singredients occupy positions within the networks. Such po-sition measures are likely to inform if a recipe contains any“popular” or “unusual” ingredients. To calculate the posi-tion measures, we first calculated various network centralitymeasures, including degree centrality, betweenness central-ity, etc., from the ingredient networks. A centrality measurecan be represented as a vector ~g where each entry indicatesthe centrality of an ingredient. The network position of arecipe, with its full ingredient list represented as a binary

vector ~f , can be summarized by ~gT · ~f , i.e., an aggregatedcentrality measure based on the centrality of its ingredients.

Network communities provide information about whichingredient is more likely to co-occur with a group of otheringredients in the network. A recipe consisting of ingredientsthat are frequently used with, complemented by or substi-tuted by certain groups may be predictive of the ratings

baseline

full ingredients

nutrition

ing. networks

combined

Accuracy

0.60

0.65

0.70

0.75

0.80

Figure 8: Prediction performance. The nutritioninformation and ingredient networks are more ef-fective features than full ingredients. The ingredi-ent network features lead to impressive performanceclose to the best performance, indicating the powerof network structures in recipe recommendation.

feature

impo

rtan

ce

0.0

0.2

0.4

0.6

0.8

1.0

20 40 60 80 100

group

nutrition (6.5%)

cook effort (5.0%)

ing. networks (84%)

cook methods (3.9%)

Figure 9: Relative importance of features in thecombined set. The individual items from nutri-tion information are very indicative in differentiat-ing high-rated recipes, while most of the predictionpower comes from ingredient networks.

the recipe will receive. To obtain the network communityinformation, we applied latent semantic analysis (LSA) onrecipes. We first factorized each ingredient network, rep-resented by matrix W , using singular value decomposition(SVD). In the matrix W , each entry Wij indicated whetheringredient i co-occurrs, complements or substitues ingredi-ent j.

Suppose Wk = UkΣkVTk is a rank-k approximation of W ,

we can then transform each recipe’s full ingredient list using

the low-dimensional representation, Σ−1k V T

k~f , as community

information within a network. These low-dimensional vec-tors, together with the vectors of network positions, consti-tute the ingredient network features.

Learning method. We applied discriminative machinelearning methods such as support vector machines (SVM) [2]and stochastic gradient boosting trees [6] to our predictionproblem. Here we report and discuss the detailed resultsbased on the gradient boosting tree model. Like SVM, thegradient boosting tree model seeks a parameterized classi-fier, but unlike SVM that considers all the features at one

feature

impo

rtan

ce

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

20 40 60 80 100

network

substitution (39.8%)

co−occurrence (30.9%)

complement (29.2%)

Figure 10: Relative importance of features repre-sented the network structure. The substitution net-work has stronger contribution (39.8%) to the totalimportance of ingredient network features than theother two networks, and it also has more influen-tial features in the top 100 list, which suggests theinformation about substitution network is comple-mentary to other features.

time, the boosting tree model considers a set of featuresat a time and iteratively combines them according to theirempirical errors. In practice, it not only has competitiveperformance comparable to SVM, but can serve as a featureranking procedure [11].

In this work, we fitted a stochastic gradient boosting treemodel with 8 terminal nodes under an exponential loss func-tion. The dataset is roughly balanced in terms of whichrecipe is the higher-rated one within a pair. We randomlydivided the dataset into a training set (2/3) and a testingset (1/3). The prediction performance is evaluated based onaccuracy, and the feature performance is evaluated in termsof relative importance [9]. For each single decision tree, oneof the input variables, xj , is used to partition the region as-sociated with that node into two subregions in order to fitto the response values. The squared relative importance ofvariable xj is the sum of such squared improvements overall internal nodes for which it was chosen as the splittingvariable, as:

imp(j) =∑k

i2kI(splits on xj)

where i2k is the empirical improvement by the k-th nodesplitting on xj at that point.

7.2 ResultsThe overall prediction performance is shown in Fig. 8.

Surprisingly, even with a full list of ingredients, the pre-diction accuracy is only improved from .712 (baseline) to.746. In contrast, the nutrition information and ingredientnetworks are more effective (with accuracy .753 and .786, re-spectively). Both of them have much lower dimensions (fromtens to several hundreds), compared with the full ingredientsthat are represented by more than 2000 dimensions (1000ingredients per recipe in the pair). The ingredient networkfeatures lead to impressive performance close to the bestperformance given by the combined set (.792), indicatingthe power of network structures in recipe recommendation.

Figure 9 shows the influence of different features in the

feature

impo

rtan

ce

0.0

0.2

0.4

0.6

0.8

1.0

2 4 6 8 10 12

nutrition

carbs (20.9%)

cholesterol (17.7%)

calories (19.7%)

sodium (16.8%)

fiber (12.3%)

fat (12.4%)

Figure 11: Relative importance of features from nu-trition information. The carbs item is the most in-fluential feature in predicting higher-rated recipes.

combined feature set. Up to 100 features with the highestrelative importance are shown. The importance of a featuregroup is summarized by how much the total importance iscontributed by all features in the set. For example, the base-line consisting of cooking effort and cooking methods havecontributed 8.9% to the overall performance. The individ-ual items from nutrition information are very indicative indifferentiating highly-rated recipes, while most of the pre-diction power comes from ingredient networks (84%).

Figure 10 shows the top 100 features from the three net-works. In terms of the total importance of ingredient net-work features, the substitution network has slightly strongercontribution (39.8%) than the other two networks, and italso has more influential features in the top 100 list. Thissuggests that the structural information extracted from thesubstitution network is not only important but also comple-mentary to information from other aspects.

Looking into the nutrition information (Fig. 11), we foundthat carbohydrates are the most influential feature in pre-dicting higher-rated recipes. Since carbohydrates comprisearound 50% or more of total calories, the high importanceof this feature interestingly suggests that a recipe’s ratingcan be influenced by users’ concerns about nutrition anddiet. Another interesting observation is that, while individ-ual nutrition items are powerful predictors, a higher predic-tion accuracy can be reached by using ingredient networksalone, as shown in Fig. 8. This implies the informationabout nutrition may have been encoded in the ingredientnetwork structure, e.g. substitutions of less healthful ingre-dients with “healthier” alternatives.

Constructing the ingredient network feature involves re-ducing high-dimensional network information through SVD,as described in the previous section. The dimensionality canbe determined by cross-validation. As shown in Fig. 12, fea-tures with a very large dimension tend to overfit the trainingdata. Hence we chose k = 50 for the reduced dimension ofall three networks. The figure also shows that using theinformation about the complement network alone is moreeffective in prediction than using other the co-occurrenceand substitute networks, even in the case of low dimensions.However, as shown in terms of relative importance (Fig. 10),the substitution network alone is not the most effective, butit provides more complementary information in the com-bined feature set.

Dimensions

Acc

urac

y

0.76

0.77

0.78

0.79

0.80

●

●●

●

●●

●

10 20 30 40 50 60 70

network

● combined

substitution

complement

co−occurrence

Figure 12: Prediction performance over reduced di-mensionality. The dimensionality of network fea-tures can be determined by cross-validation. Thebest performance is given by reduced dimensionk = 50 when combining all three networks. In ad-dition, using the information about the complementnetwork alone is more effective in prediction thanusing other two networks.

4

8

43

19

6

chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlim

e juicelem

on extractchocolate puddingalm

ond extractcream

of chicken soupbeefalm

ondkalevanillavanilla extractevaporated m

ilksour creambutterm

ilkchicken brothhalf and halfm

ilkbrow

n sugarbutterhoneyapplesauceolive oilsplenda

−0.5Value

Color Key

4 8 43 19 6

chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlime juicelemon extractchocolate puddingalmond extractcream of chicken soupbeefalmondkalevanillavanilla extractevaporated milksour creambuttermilkchicken brothhalf and halfmilkbrown sugarbutterhoneyapplesauceolive oilsplenda

−0.5 0.5Value

Color Key

ingredient

svd dimension

1

2

3

4

5

Figure 13: Influential substitution communities.The matrix shows the most influential feature di-mensions extracted from the substitution network.For each dimension, the six representative ingredi-ents with the highest intensity values in the decom-posed matrix are shown, with colors indicating theirintensity. These features suggest that the communi-ties of ingredient substitutes, such as the sweet andoil in the first dimension, are particularly informa-tive in prediction.

In Fig. 13 we show the most representative ingredients inthe decomposed matrix derived from the substitution net-work. We display the top five influential dimensions, evalu-ated based on the relative importance, from the SVD resul-tant matrix Vk, and in each of these dimensions we extractedsix representative ingredients based on their intensities in

the dimension (the squared entry values). These reprsenta-tive ingredients suggest that the communities of ingredientsubstitutes, such as the sweet and oil substitutes in the firstdimension or the milk substitutes in the second dimesion(which is similar to the cluster shown in Fig. 6), are partic-ularly informative in predicting recipe ratings.

To summarize our observations from the experiments, wefound we were able to effectively predict users’ preference fora recipe, but the prediction is not through using a full listof ingredients. Instead, by using the structural informationextracted from the relationships among ingredients, we canbetter uncover users’ preference about recipes.

8. CONCLUSIONRecipes are little more than instructions for combining

and processing sets of ingredients. Individual cookbooks,even the most expansive ones, contain single recipes for eachdish. The web, however, permits collaborative recipe gen-eration and modification, with tens of thousands of recipescontributed in individual websites. We have shown how thisdata can be used to glean insights about regional preferencesand modifiability of individual ingredients, and also how itcan be used to construct two kinds of networks, one of in-gredient complements, the other of ingredient substitutes.These networks encode which ingredients go well together,and which can be substituted to obtain superior results, andpermit one to predict, given a pair of related recipes, whichone will be more highly rated by users.

In future work, we plan to extend ingredient networks toincorporate the cooking methods as well. It would also beof interest to generate region-specific and diet-specific rat-ings, depending on the users’ background and preferences.A whole host of user-interface features could be added forusers who are interacting with recipes, whether the recipeis newly submitted, and hence unrated, or whether they arebrowsing a cookbook. In addition to automatically predict-ing a rating for the recipe, one could flag ingredients thatcan be omitted, ones whose quantity could be tweaked, aswell as suggested additions and substitutions.

9. ACKNOWLEDGMENTSThis work was supported by MURI award FA9550-08-1-

0265 from the Air Force Office of Scientific Research. Themethodology used in this paper was developed with sup-port from funding from the Army Research Office, Multi-University Research Initiative on Measuring, Understand-ing, and Responding to Covert Social Networks: Passiveand Active Tomography.

10. REFERENCES[1] S. M. Boback, C. L. Cox, B. D. Ott, R. Carmody,

R. W. Wrangham, and S. M. Secor. Cooking andgrinding reduces the cost of meat digestion.Comparative Biochemistry and Physiology - Part A:Molecular and Integrative Physiology, 148(3):651 –656, 2007.

[2] C. Cortes and V. Vapnik. Support-vector networks.Machine learning, 20(3):273–297, 1995.

[3] P. Forbes and M. Zhu. Content-boosted matrixfactorization for recommender systems: Experimentswith recipe recommendation. Proceedings ofRecommender Systems, 2011.

[4] J. Freyne and S. Berkovsky. Intelligent food planning:personalized recipe recommendation. In IUI, pages321–324. ACM, 2010.

[5] J. Freyne and S. Berkovsky. Recommending food:Reasoning on recipes and ingredients. User Modeling,Adaptation, and Personalization, pages 381–386, 2010.

[6] J. Friedman. Stochastic gradient boosting.Computational Statistics & Data Analysis,38(4):367–378, 2002.

[7] J. Friedman, T. Hastie, and R. Tibshirani. Additivelogistic regression: a statistical view of boosting.Annals of Statistics, 28:2000, 1998.

[8] G. Geleijnse, P. Nachtigall, P. van Kaam, andL. Wijgergangs. A personalized recipe advice systemto promote healthful choices. In IUI, pages 437–438.ACM, 2011.

[9] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin.The elements of statistical learning: data mining,inference and prediction. The MathematicalIntelligencer, 27(2), 2005.

[10] F. Kamieth, A. Braun, and C. Schlehuber. Adaptiveimplicit interaction for healthy nutrition and foodintake supervision. Human-Computer Interaction.Towards Mobile and Intelligent InteractionEnvironments, pages 205–212, 2011.

[11] Y. Lu, F. Peng, X. Li, and N. Ahmed. Couplingfeature selection and machine learning methods fornavigational query identification. In CIKM, pages682–689. ACM, 2006.

[12] I. Rombauer, M. Becker, E. Becker, and L. Maestro.Joy of cooking. Scribner Book Company, 1997.

[13] M. Rosvall and C. Bergstrom. Maps of random walkson complex networks reveal community structure.PNAS, 105(4):1118, 2008.

[14] Y. Shidochi, T. Takahashi, I. Ide, and H. Murase.Finding replaceable materials in cooking recipe textsconsidering characteristic cooking actions. In Proc. ofthe ACM multimedia 2009 workshop on Multimedia forcooking and eating activities, pages 9–14. ACM, 2009.

[15] M. Svensson, K. Hook, and R. Coster. Designing andevaluating kalas: A social navigation system for foodrecipes. ACM Transactions on Computer-HumanInteraction (TOCHI), 12(3):374–400, 2005.

[16] M. Ueda, M. Takahata, and S. Nakajima. User’s foodpreference extraction for personalized cooking reciperecommendation. Proc. of the Second Workshop onSemantic Personalized Information Management:Retrieval and Recommendation, 2011.

[17] L. Wang, Q. Li, N. Li, G. Dong, and Y. Yang.Substructure similarity measurement in chineserecipes. In WWW, pages 979–988. ACM, 2008.

[18] Wikipedia. Outline of food preparation, 2011. [Online;accessed 22-Oct-2011].

[19] R. Wrangham. Catching fire: how cooking made ushuman. Profile Books, 2010.

[20] Q. Zhang, R. Hu, B. Mac Namee, and S. Delany. Backto the future: Knowledge light case base cookery. InProc. of The 9th European Conference on Case-BasedReasoning Workshop, page 15, 2008.

Date post:	03-Mar-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Recipe recommendation using ingredient networksladamic/papers/recipe.pdf · 2011. 10. 22. ·...

Documents