New normalization of barcodes

alex · October 23, 2024, 8:20am

Hello Everyone, in the last few weeks, we have been deduplicating a lot of products that were present multiple times in the database, with an id (barcode) that differed only by the number of leading 0s. (e.g. 104803002624 and 0104803002624). Some barcode scanners can also add leading 0s (e.g. transforming 12 digit UPC codes to 13 digit EAN13).
To avoid this issue, we changed the normalization rules that we apply to barcodes so that we have unique barcodes with a predictable number of digits and leading 0s: openfoodfacts-server/docs/api/ref-barcode-normalization.md at main · openfoodfacts/openfoodfacts-server · GitHub
If you use the API, read and write queries should continue to work as before regardless of which barcode you use.
If you use the database dumps or CSV exports, note that the code may have a different number of 0s. It would be best to get a new full update.
In addition, we also changed the paths of images for products with short barcodes, so that we don’t have a huge number of directories at the root of the product images directory. The documentation to compute the image URLs has been updated: openfoodfacts-server/docs/api/how-to-download-images.md at main · openfoodfacts/openfoodfacts-server · GitHub
The old image paths will continue to work for several months, but if you compute image URLs yourselves, please update your code to use the new paths.
All products have been migrated to the new normalized barcodes and images paths, and we deduplicated the conflicting products (keeping the ones that had the most data).
We did our best to ensure that this change is as transparent as possible for most users, but if you find issues or have questions, please let us know.