Ingredients

Ingredients

  • Spoonacular has a free list you can download with 'the 1000 most frequently used ingredients’

  • Data set of ingredient lists pulled from Datafiniti's Product Database. The data set covers 10,000 different food listings and includes the ingredient list for each one

  • Recipe5K includes a list of 1000+ base ingredients as a txt file

  • The Kaggle What’s cooking dataset includes almost 4500 ingredient items

  • Recipe5K: dataset for ingredients recognition with 4,826 unique recipes composed of an image and the corresponding list of ingredients. Does not include the recipe instructions, which should be retrieved from Yummly

  • Yummly-28K and Yummly-66K: Datasets crawled from the recipe-sharing website Yummly. Includes ingredients (as part of Json metadata files) and images. Does not include recipe instructions.

See also the Dictionary of Food Ingredients (2011) by Robert Igoe, a technical source of information on over 1,000 food ingredients and additives. It provides a useful list of ingredient categories (e.g., Emulsifiers, Fats and Oils, Flavors, Flour, Spices, Starch, Sweeteners, Vitamins) and food categories (e.g., Cheese,
Cream Products, Dressings, Frozen Desserts, Fruit Spreads, Macaroni and Noodle Products, Margarine and Butter, Milks, Syrups, Tomato Products, Yogurt).

Food composition (nutritional value)

  • ANSES-CIQUAL French 2020 food composition table version, a table in excel or xlm format that breaks down various types of food (soups, deserts, dishes, vegetables, etc.) in its basic nutritional components (energy/calories, water, protein, carbohydrate, fat, etc.).

  • This paper estimates nutritional values for recipes.

  • Open Food Data, includes, amongst others, a link to a very large MongoDB database of food facts.

A procedure for normalizing ingredients

The paper Temporal Patterns in Online Food Innovation describes a procedure for ‘normalizing’ ingredients, assuming ingredients are free-form text to start with, e.g. lists of arbitrary strings defined by users. Of course, such free-form text due to word variants, misspellings, etc. lead to a disambiguation problem. The procedure proposed to resolve such issues is as follows:

  1. split conjunctions such as ‘salt and pepper’.

  2. replace each alternative of two ingredients with the name of the more popular option, e.g., ‘butter or margarine’ was replaced with ‘butter’.

  3. filter out stop-words, special characters, amounts and units, and words describing the preparation process, e.g., ‘cooked’, ‘washed’.

  4. Finally, replace ingredients occurring less than 200 times with more popular variants. To each of such ingredient names try to match with others starting from the most popular ones, e.g., the ingredient ‘the glass of salted water’ was replaced with ‘water’ and ‘salt’. Sometimes no variants can be matched. In this special case, ingredient names occurring less than 100 times are simply discarded.

In the paper cited, this procedure reduced the initial number of over 334 thousand ingredients in the dataset to 2208.