Taxonomy Editor - GSoC 2022

aadarshram · May 30, 2022, 4:36pm

Starting a discussion thread for people for improving a project!

Hi, I’m Aadarsh, a freshman studying Computer Science at the National Institute of Technology, Tiruchirappalli. I’ll be working on building a Taxonomy Editor for GSoC 2022 this summer.

This project is at its planning stage and we’d love your feedback regarding this project.
Some details about the project are mentioned below:

Requirement Analysis

Currently, we have set the following requirements for the application:

Creation of a backend for parsing taxonomies
Implement CRUD of elements in a taxonomy
Provision for quick search of a component
Build hierarchy visualizations for taxonomies
Implementing validation of a taxonomy modification

Architecture

github.com/openfoodfacts/taxonomy-editor

Overall Architecture and Design

opened 06:45PM - 27 May 22 UTC

aadarsh-ram

## Initial idea This taxonomy editor will mostly comprise of a decoupled front-…end and back-end. A database for storing taxonomies in JSON format (such as MongoDB) will be investigated and used for this application. ### Backend - [ ] Parses taxonomies and converts it to JSON format and stores it in the database. - [ ] Validation of a taxonomy modification is done by this API. - [ ] After approval from a maintainer, JSON is reconverted back to native .txt format and updated in Product Opener. - [ ] **Tech stack** - Python + FastAPI (not finalized) ### Frontend - [ ] Taxonomy structure using a "file hierarchy" type visualization, will be displayed. - [ ] CRUD of components in a taxonomy in the visualization will be allowed here. - [ ] Searching of a component in a taxonomy will be implemented here. - [ ] Existing implementations for visualizing DAG's will be attempted to be integrated here. - [ ] **Tech stack** - Not finalized Any other features required will be added to this issue. Other ideas regarding this architecture are also welcome!

This issue clearly contains the initial architecture planned for this project.

Feedback

We would love to know your feedback on adding any other requirements, or about any workflows that you follow while editing taxonomies.

Any ideas regarding the architecture and design are much appreciated.

Feel free to drop your suggestions in this thread!

alex · June 1, 2022, 9:32am

On the visualization front I didn’t see anything interesting and dedicated to DAG that are not too complex for our job (that does not look too much like a graph, which is messy). So I guess folder hierachy metaphore is ok, although incomplete ? Maybe we could explore what genealogy visualization software do, as they are kind of an extreme case of that. The closest is class hierarchy diagrams for software, but I think it’s a bit complex too.

I would maybe build something akin to
Capture d’écran de 2022-06-01 11-25-59
(source: GitHub - antvis/XFlow)

but where the arrow would cross (because of multiple parents / children) but they would only be indicative.

What would be more important is to arrange my parents in levels of depth (from top to bottom), siblings on same level, and children underneath, by depth also.

To compute “level”, my intuition is that the longuest path from top to element is the best choice (it’s better to have a long arrow than an element above one of its parent).

You would of course never display whole taxonomy, but only surrounding elements to a specific entry.

aadarshram · June 1, 2022, 5:59pm

XFlow seems cool! I’ll look into it and check out how well it fits our usecase. Thanks for the suggestion!

alex · June 8, 2022, 1:16pm

@aadarshram, Speaking with @charlesnepote I understand we should not focus so much on vizualizing whole graph but a single item.

That is main page should be the one around one item.

We could put translations, synonyms, and properties.

Charles suggest there are simple tools that would help a lot, like building the url to wikipedia, and other nice features, knowing if the category is present in open food facts, how many entries it haves etc.
List of entries that may have common words could be useful.

Then we could add parents and children.

It seems to me we could visualize ancestors and descendants (but only for this entry) as two distinct “rail/metro” graph like the one often used for git:
Capture d’écran de 2022-06-08 15-06-13
(taken from GitHub - nicoespeon/gitgraph.js: 👋 [Looking for maintainers] - A JavaScript library to draw pretty git graphs in the browser)

We may decide of a maximum threshold on number of displayed descendant / ancestors to keep things manageable

Using this method, we won’t have to visualize whole graph, still one could reach an entry:

either by using the search
either by navigating from an entry to one of it’s descendant / ancestor
and we could of course provide a list of “root” nodes and, maybe, “leaf nodes”, last edited, etc.

I think it’s a good pragmatic idea to center our first interface around the taxonomy “entry” page, it is less risky.

alex · July 20, 2022, 3:48pm

Just commenting here that @pierre.slamich as done figma mockups : Démo Day Saison 10 - YouTube

We are first concentrating on edition, to handle search and navigation later on.