Integration of USDA FoodCentral data into OFF

OFF now includes data from the USDA which was added by downloading a CSV file and importing that. There may be benefits to accessing the API provided by the USDA. For example, it appears that nutritional info has more details in the API. But there are many questions, which may be discussed in this forum.

FYI, from questions that I have asked, it appears that we will be importing data from the IUSDA FoodCentral data when OFF is missing a food or we see that changes have occurred. This may get us into the same situation that was noticed with the Fleury Michon data that comes via the producer app.

Ideally, I think that when we see different data, we would want to know whether this came from an earlier pull of the USDA data or from a person. I think that edits from a person should be respected over importable data, but we can flag the data for analysis when we see something in the imported data that would override a user edit.

See Is it okay for producers to overwrite user-contributed data with bad data daily?

My pull-request in support of this is here. It is very basic now. Do not, please, judge harshly.

I’ve update the manual spreadsheet to highlight the values that I assumed to be zero on the USDA CSV file with N/A’s.

There are 7 N/A’s from these 10 items in the USDA CSV file. Each of these nutrients were listed as zero on the nutrition label as show in OFF.

Question 1: What tollgates/requirements would you like me to check before placing replacing blank fiber and added sugar with zero’s on this USDA import?

Question 2: Ray, any luck pulling the nutrition data for these 10 items from the API?

Sorry I did not see that this question was added. I obviously need to turn up the volume on my notifications.

I should have included @rkiddy. I’ll do that in the future.

@stephane What tollgates/requirements would you like me to check before replacing blank fiber and sugar with zeros?
For example, if we checked 100 items that had blank fiber and/or added sugar on the nutritional database and the USDA labels on OFF all had zero fiber and/or added sugar, would that be sufficient checking?

@Victor Here is what I found:

Not in USDA and not in OFF:
‘20200129783’
‘44400176002’
‘41570094754’
‘72745804113’
‘15800050117’

Not in USDA but in OFF:
‘0053000006329’

In USDA and OFF:
‘4099100028829’
‘850229005207’
‘856481003043’
‘4099100099157’

And we want to find things that are in the USDA data and not in OFF to import.

By the way, my understanding of where per 100g numbers come from and where per serving numbers come from was not right.

Both the search and the fetch from USDA give you per serving. It is just that the responses look different enough to be confusing.

From my PR:

 $ python3 usda_check.py --upc 4099100028829

     search located fdcId: 1122027
 
 Serving size = 39.0 g
 off_nutrients:
 ['calcium',
  'carbohydrates',
  'cholesterol',
  'energy-kcal',
  'energy',
  'fat',
  'fiber',
  'fruits-vegetables-nuts-estimate-from-ingredients',
  'iron',
  'monounsaturated-fat',
  'nova-group',
  'nutrition-score-fr',
  'pantothenic-acid',
  'polyunsaturated-fat',
  'potassium',
  'proteins',
  'salt',
  'saturated-fat',
  'sodium',
  'sugars',
  'trans-fat',
  'vitamin-b1',
  'vitamin-b6',
  'vitamin-b9',
  'vitamin-pp',
  'zinc']
 
 
 Nutrient                                             USDA Per 100g         USDA Per Serving       OFF Per Serving
   Calcium, Ca                                       | 26.0 MG             | 26.0 mg             | 26 mg               |
   Carbohydrate, by difference                       | 84.6 G              | 84.62 g             | 84.6154 g           |
   Cholesterol                                       | 0.0 MG              | 0.0 mg              | 0 mg                |
   Energy                                            | 385 KCAL            | 385.0 kcal          | 384.6154 kcal       |
   Fatty acids, total monounsaturated                | 1.28 G              | 1.28 g              | 1.28 g              |
   Fatty acids, total polyunsaturated                | 0.0 G               | 0.0 g               | 0 g                 |
   Fatty acids, total saturated                      | 0.0 G               | 0.0 g               | 0 g                 |
   Fatty acids, total trans                          | 0.0 G               | 0.0 g               | 0 g                 |
   Fiber, total dietary                              | 5.1 G               | 5.1 g               | 5.1 g               |
   Folic acid                                        | 51.0 UG             | 51.0 µg             | NO LONG NAME        |
   Iron, Fe                                          | 13.8 MG             | 13.85 mg            | 13.85 mg            |
   Niacin                                            | 5.13 MG             | 5.128 mg            | NO LONG NAME        |
   Pantothenic acid                                  | 2.56 MG             | 2.564 mg            | 2.564 mg            |
   Potassium, K                                      | 205 MG              | 205.0 mg            | 205 mg              |
   Protein                                           | 7.69 G              | 7.69 g              | 7.6923 g            |
   Sodium, Na                                        | 346 MG              | 346.0 mg            | 0.3461538 g         |
   Sugars, added                                     | 17.9 G              | 17.9 g              | 17.9487 g           |
   Sugars, total including NLEA                      | 18.0 G              | 17.95 g             | 17.9487 g           |
   Thiamin                                           | 2.0 MG              | 2.0 mg              | NO SHORT NAME       |
   Total lipid (fat)                                 | 2.56 G              | 2.56 g              | 2.5641 g            |
   Vitamin B-6                                       | 0.513 MG            | 0.513 mg            | 0.513 mg            |
   Vitamin D (D2 + D3), International Units          | 0.0 IU              | 0.0 IU              | NO SHORT NAME       |
   Zinc, Zn                                          | 5.77 MG             | 5.77 mg             | 5.77 mg             |
 
 Extra OFF Nutrients:
 ['energy-kcal',
  'fruits-vegetables-nuts-estimate-from-ingredients',
  'nova-group',
  'nutrition-score-fr',
  'salt',
  'vitamin-b1',
  'vitamin-b9',
  'vitamin-pp']

@rkiddy

If those are from my list, all items were in USDA list and OFF.

Try to add leading zeros to search OFF.

Also, I am using full, massive csv file. It was so large that I had to open it in Notepad. If you attempt to open in Excel it only reads the first ten thousand lines or so.

If that’s not it, let’s jump on a call.

Victor

@rkiddy
Not in USDA and not in OFF:
‘20200129783’ in OFF and USDA with a leading zero
‘44400176002’ in OFF and USDA with a leading zero
‘41570094754’ in OFF and USDA with a leading zero
‘72745804113’ in OFF and USDA with a leading zero
‘15800050117’ in OFF and USDA with a leading zero

Not in USDA but in OFF:
‘0053000006329’ in OFF with 00, in USDA with a single leading zero.

The questions we are attempting to address are:

  1. Does the nutrition data from USDA data-base from the API match the .csv data when correcting for serving size? (Are the values from your Python script the same as mine from the spreadsheet). If not, which of the databases more closely match the OFF data (with nutrition label pictures?)
  2. Does the USDA API data include zero’s for added sugar and fiber in the location missing in the 10 example UPC codes?
  3. Can we use your script to determine if what percent of the time “missing” added sugar and fiber data from the USDA data corresponds to zero added sugar and zero fiber on the nutrition labels as included in OFF

Thank you. This kind of clarification is helpful.

No problem. In checking my excel sheet, excel crops off leading zeros unless you make the number a string and I didn’t catch it.

I put a question to the people who run the Food Central site. I said:

nutrition info per serving?
I see that the nutrient information is made available per 100g. Is it possible to retrieve, from any endpoint or with any request, the information nutrient info in quantities per serving?

The “FoodData Central Expert” said:

Thank you for your inquiry regarding FoodData Central’s API. The API is pulling data directly from the database which is stored in 100g or 100ml serving. Unfortunately, the API does not have any option for specifying the serving sizes. The application uses the data in the database to calculate out portion/serving sizes on the fly based off serving size weight in grams/ml, so you will need to do these calculations on your end.

So. It may have looked as though there is a way to get the per-serving information directly from the USDA. I was confused and thought that this is true. It is not. So, no matter what, we will need to do the calculation for per-serving information.

FYI, the USDA has updated the CSV files at thier site:

https://fdc.nal.usda.gov/download-datasets.html

This page now shows files dated October 23, 2023. For example, see:

https://fdc.nal.usda.gov/fdc-datasets/FoodData_Central_branded_food_csv_2023-10-26.zip