Better specifying the `link` field

The link field is not well specified, leading to several issues.

Image

Semantic

What does mean link?

  • Is it the link to the official product’s page from the producer’s website?
  • Is it a link to the producer’s website homepage?
  • Can it be another link?

The documentation says “Link to the product page on the official site of the producer”. But in practice, a majority of addresses are related to the producer’s site and not the product page.

As of today (2025-04-04):

  • a link value has been entered for 73,500+ products
  • 37,500+ (51%) are not related to a product page, but are related to the producers’ website (I used the following regexp to catch them: ^(https?)?(www\.)?(.*)?(\.)([^\/]*)?\/?$)

What do we really want?

  1. any link if it’s related to the food manufacturer
  2. the official product page
  3. the home page of the producer
  4. the customer service page
  5. another link (please explain)

I would say that 3 could be the given rule, as it’s the most stable address (and it’s already representing 51% of the values). But we could accept other addresses linking to the producer’s website. We could regularly check the addresses against 404 or scam.

Format

As of today, 52,300+ (71%) are beginning with “http”.

  • https://example.com/ should be ok as it’s the complete protocol + address
  • when the address is the root of the website, the leading slash might be automatically removed to improve comparison or aggregation
  • is www.example.com ok?

A “data quality error” facet should list bad links, eg. gttp://bad-protocol.net.

Should we create a link_tags field to normalize the link field:

Name

Isn’t the database field name link too confusing? Shouldn’t be producer_link?

How should it work?

I think that Open Food Facts should not endorse these links. In Wikipedia, external links use the HTML attribute rel="nofollow" to tell search engine they don’t endorse external links. I think we should do the same.

Any more ideas about this field?

1 Like

I would keep the link meaning as is, and have it point to the most specific page on the manufacturer’s website that describes the product, if available.

If not available, then the brand’s website could be shown, but it should come from the upcoming brands taxonomy, not from the product data I think.

I agree to all ideas specified above and like them.

In addition to producer_link which is estimated to exist for the majority of all products, there could be product_link.

Any more ideas about this field?

To keep this reasonably stable even if the product disappears from the market, an additional reference to use https://archive.org would be useful.

It is necessary to counteract the fast pace and short lifespan of products and provide reliable information, something that some industries/manufacturers are not (yet) ready to do.