Update Frequency of JSONL Data File

stevepburgess · July 20, 2025, 10:16pm

Hi I am developing a simple-ish PWA app to enable me to create a food diary to keep on track with diet and health targets.

I have downloaded the big JSONL file and imported the relevant bits of the JSON into my own database (I am only interested in images, product names and brands, ingredients and nutritonal info).

I noticed that a product that I added to my diary had incorrect spelling of the product name and incorrect calorific values (the original poster had input the kj into the kcal field) so the product was reporting lots of calories for a small portion.

I logged into OpenFoodFacts and corrected the spelling and nutritional info and thought I’d just wait fo the correct data to be reflected in my database when my overnight script draws down and extracts the data from the JSONL file. I also added two new products that were not in the JSONL file.

The Open Food Facts website says the data downloads are updated nightly - but the revisions and additions haven’t appeared in last nights update.

Is there any merit in downloading the data every day or should I do it weekly or on some other schedule?

Many thanks

raphael0202 · July 21, 2025, 7:34am

Hello Steves,
The JSONL is indeed exported every night (daily maintenance scripts start around midnight CET), but we have many exports, which takes a lot of time.
Currently, the JSONL is generated around 4PM CET, so if you want the latest dump, you should download it after this time.
You can check the datetime of the last upload using curl, looking at the Last-Modified header:

curl --head https://openfoodfacts-ds.s3.eu-west-3.amazonaws.com/openfoodfacts-products.jsonl.gz

Which returns:

x-amz-id-2: 0+GVUzIUCRk7543wnsCJNvRPx38MH68LrVN7kpWyUWxXI8aG1uWr/syDxb6jY/uugD/Xj2W/lSA=
x-amz-request-id: FC7T5NZZ6J2AYVVS
Date: Mon, 21 Jul 2025 07:31:13 GMT
Last-Modified: Sun, 20 Jul 2025 13:51:13 GMT
ETag: "78dfb5505d8de3ae6a82214045592fe6-579"
x-amz-server-side-encryption: AES256
x-amz-version-id: THLtAR9Tgurcs2ulF5IXxvk1qIeT_iTu
Accept-Ranges: bytes
Content-Type: application/gzip
Content-Length: 9712456568
Server: AmazonS3

In the future, we would like to perform less exports on the main server (ideally only MongoDB and JSONL), and delegate the generation of the rest of the exports on a different server, which would allow reusers to get a dump much sooner in the day.

If you have any additional question, feel free to ask!

stevepburgess · July 21, 2025, 7:06pm

Hi Raphael,

Thanks for the response. I’ll leave my scheduled task as it is as although the JSONL seems to be ready early afternoon, I’d prefer my server to process it in the small hours.

Kind regards,
S