New object detection and image segmentation widgets #6

NielsRogge · 2021-06-04T07:19:49Z

For the DETR model, which will soon be part of HuggingFace Transformers (see huggingface/transformers#11653 (comment)), it would be cool to have object detection and image segmentation (actually panoptic segmentation) inference widgets.

Similar to the image classification widget, a user should be able to upload/drag an image to the widget, which is then annotated with bounding boxes and classes (in case of object detection), or turned into a segmentation map (in case of panoptic segmentation).

Here are 2 notebooks which illustrate what you can do with the head models of DETR:

DetrForObjectDetection: https://colab.research.google.com/drive/170dlGN5s37uaYO32XKUHfPklGS8oB059?usp=sharing
DetrForSegmentation: https://colab.research.google.com/drive/1hTGTPGBLPRY1QkLmG7P9air6v04tcXUL?usp=sharing

The models are already on the hub: https://huggingface.co/models?search=facebook/detr

cc @LysandreJik

The text was updated successfully, but these errors were encountered:

julien-c · 2021-06-04T09:28:04Z

really cool!

so i think the steps are:

define and agree on an API shape (should be future proof to other potential models with the same task). I usually try to take inspiration from the existing hosted APIs (Google Vision, etc) that do those tasks
implement those models in the Inference API, or here in api-inference-community (@Narsil and team can review a first draft)
build a widget (<= we'll open source a current snapshot of our widget code in this present repo, in the next few days)

NielsRogge · 2021-06-04T11:14:20Z

Ok, I think it might make sense to divide the API into object detection, semantic segmentation, instance segmentation, and panoptic segmentation. This blog post explains the difference between semantic/instance/panoptic segmentation well.

The input/output of the various tasks is as follows:

object detection: input = RGB image. Output: RGB image with bounding boxes and corresponding instance labels.
semantic segmentation: input = RGB image. Output: per-pixel semantic class label.
instance segmentation: input = RGB image. Output: per-object (instance) mask and instance label.
panoptic segmentation: input = RGB image. Output: per-pixel semantic class + optional instance labels.

julien-c · 2021-06-04T11:21:08Z

Agreed. Though we'll probably group all of those in a generic image-segmentation on the hub side for ease-of-use/accessibility

LysandreJik · 2021-06-11T07:49:51Z

We just open sourced widgets in huggingface/huggingface_hub#87 if you want to take a look @NielsRogge! We'll write a document on how to get started but feel free to try it out locally!

julien-c · 2021-06-15T12:28:24Z

maybe @mishig25 can take a look at the widget side of this!

Narsil · 2021-06-23T12:23:40Z

Proposed PR on transformers side huggingface/transformers#12321

@mishig25 I think image manipulation will be a bit tricky to do client side, hence I propose a "reduction" mecanism for the actual API (that goes on top of the pipeline or within the pipeline) to simply output 1 image, what do you think?

If we kept the actual masks, we could do improved UX with maybe mouse hover effects and so on, but I am not 100% sure of how easy to do in JS (doing it in python is somewhat trivial maybe half a day to fix all the odd issues like label placement and so on)

julien-c · 2021-06-23T12:33:53Z

tagging @gary149 and @severo as well but I think client-side rendering can/will be way cooler (interactivity as you mention, quality of the UX, etc)

Also is someone ends up calling the API outside of the widget (like in an actual programmatic use case I don't think they will want the rendered output)

severo · 2021-06-23T12:35:30Z

I love doing this kind of processing in JS

Narsil · 2021-06-23T12:43:27Z

Yes API-wise we need to be able to support raw masks for sure !

mishig25 · 2021-06-23T12:48:39Z

@severo @julien-c please let me know if there's anything particular you'd like me to work on. Otherwise, I can start digging more into Visualizer module of detectron2 and see how desired results can be achieved with JS & web interactivity

mishig25 · 2021-06-25T08:22:36Z

Assuming the API output will be

[
   {
        "mask": // Array<Array<Bool>> 2D array of bool,
         "score": // float,
         "label": // str,
   },
// ...
]

In terms of visualizing masks, which option would you suggest:

<canvas>-based approach
<img>-based approach that uses CSS property mask-image
something else entirely

Unless there is an objection of using <canvas> element, I think <canvas>-based approach will be the most straightforward (I might be wrong).

severo · 2021-06-25T08:31:17Z

I think it's the best approach too

julien-c · 2021-06-25T12:59:38Z

i'm not a pro at frontend drawing technologies, what are the Pros & Cons of svg vs. canvas? what about WebGL, maybe via ThreeJS? 🤯

(just out of curiosity!)

severo · 2021-06-25T13:28:48Z

SVG is really done for vectorial drawings. We can incorporate bitmap images in it, but it's not really natural and you have to generate the images by the way (using canvas)
Managing mask images is naturally done by modifying the pixels of a canvas (see https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Pixel_manipulation_with_canvas for example, or https://observablehq.com/@severo/voronoi-stippling-on-elevation-dem).
re: WebGL/three.js... why not! But it's a bit the same as SVG, we have to generate the textures, then apply them to the 3D geometries (like in https://observablehq.com/@severo/voronoi-cloth)

mishig25 · 2021-06-26T01:03:14Z

Widget-wise,
svg would be better suited for object detection (bounding boxes)
WebGL/three.js would be better suited for anything that relates to 3D or complex 2D graphics (like images-to-3d 3D object generation from images)

mishig25 · 2021-06-30T15:37:48Z

I'll create a draft PR once I refactor the code.
Assuming the API output will be [{mask, score, label}, ...] and based on @gary149's feedback, here is a screenshot (please add any other feedback as a comment below):

Currently, it supports highlight on mouseover, should I add mobile version of it like highlighting on touchbegin/tapped?

osanseviero · 2021-06-30T15:39:37Z

😮

julien-c · 2021-06-30T15:55:40Z

looks really cool!

NielsRogge · 2021-06-30T16:09:25Z

Wow, super cool!! Really nice to see the cats picture still going strong haha (it's part of COCO evaluation, they are not my cats sadly).

I should add the id2label to the config file, such that we can see the actual labels instead of LABEL_93 and so on.

mishig25 · 2021-07-07T09:32:38Z

Pushed updates in branches widget-image-segmentation & widget-object-detection

Updates:

WIP/implement object detection widget
Assuming the API input will be identical to that of ImageClassificationWidget and API output will be

[
   {
        "boundingBox": // Array<{x: number, y: number}> 2D array of 4 corrner vertixes of the bounding box,
         "score": // float,
         "label": // str,
   },
// ...
]

CSS transitions of fadeIn/Out when masks/boundinfBoxes chancge visibility on user interaction
Subtle darkening of WidgeOutputChart when switched to darkmode

Screenshots:

ImageSeg	ObjDet

osanseviero · 2021-07-07T10:21:14Z

FYI @nateraw

mishig25 · 2021-07-20T13:42:38Z

@Narsil and I are discussing whether image seg & obj det should be same(identical) or different pipelines.

Reasons to treat them as same pipelines:

The tasks are related
Trying to limit number of unique pipelines & widgets
In this case, we will treat bounding boxes from obj det as masks from image seg. In other words, obj det is a special case of image seg.

Reasons to treat them as different pipelines:

Differences in output (masks VS bounding boxes) are different enough to create unique widgets for each task (image seg & obj det)

Please let us know which option you would prefer and why @julien-c @NielsRogge @osanseviero @severo and we can reach a consensus

julien-c · 2021-07-20T18:19:07Z

I would treat them as different pipelines/widgets, I think it's clearer.

but it depends on whether a single model/checkpoint can output both representations at the same time, in one model forward pass? (my understanding was no)

NielsRogge · 2021-07-21T07:51:22Z

I would also treat them differently, as they are quite separate tasks.

Object detection is fairly simple: given an image, predict class labels + corresponding bounding boxes.

However, image segmentation has different subtasks:

I wonder whether all of these can be supported by a general image segmentation pipeline, or whether we should create one for every subtask. I also am wondering about the names of the head models: for now I have called DETR's panoptic segmentation model DetrForSegmentation, but it might be more appropriate to call it DetrForImageSegmentation (if we join all subtasks into one) or DetrForPanopticSegmentation (if we do decide to split up the different subtasks).

Currently I'm working on another model, SegFormer, which is a semantic segmentation model, which predicts a label per pixel. So also here, current wondering how to call the head model: SegFormerForImageSegmentation, or SegFormerForSemanticSegmentation?

Image segmentation seems to take all kinds of exotic forms, for example last week a paper by Facebook AI came out called "Per-Pixel Classification is Not All You Need for Semantic Segmentation". So even for semantic segmentation, there are different ways to solve the problem. Edit: reading the abstract, it seems fairly simple, they predict a binary mask per label rather than doing per-pixel classification.

Curious about hearing your thoughts about all of this. I guess I should do a deep dive into image segmentation, because I'm coming from NLP.

Narsil · 2021-07-21T10:52:11Z

Hi,

Bounding boxes is really the same as segmentation to me, it's just that the output can be simplified as squares.
You are after all declaring part of the image as belonging to a certain class. The fact that it is a square shouldn't really matter to a user.
The parallel in NLP is NER vs POS, which are really identical and was correctly labeled token-classification in transformers.

Image segmentation is really multi classification PER pixel, so a general list of masks+labels should cover all the potential needs (every mask can hit a single pixel multiple times). (It's equivalent as a list of class per-pixel).

For instance-aware + part-aware, one, simply needs to add some form of dependencies between the parts (everything is most likely a tree, so a simple "parent" link should cover all cases there).

mishig25 · 2021-07-22T13:01:43Z

Aggregating all the opinions expressed above, could I preliminarily conclude that:

Separate pipelines for: [object-detection, image-segmentation]
Points considered:
a. Outputs different enough to be separate tasks (mask:image vs box:array of 4 vertices)
Only one general image segmentation pipeline for seg_subtasks: [semantic, panoptic, part, ...]
Points considered:
a. Ease-of-use/accessibility
b. All subtasks can be covered with a general image seg task

One possibility is to have a one pipeline (for image seg & object detection), but have 2 different widgets for visualizing mask vs box.
However, ideally, we'd keep 1-1 relationship between pipelines & widgets.

Also, do we have to consider size differences in treating object detection outputs as mask vs box? Box (which is array of 4 {x:int,y:int}) would be much smaller than Mask (which is image data). Is this difference in sizes significant enough to consider this point?

Please let me know any thoughts 👍

julien-c · 2021-07-22T13:39:38Z

Yes, let's do two distinct widgets on the frontend side, and I would also tend to do two distinct pipelines 👍

mishig25 · 2021-07-29T13:10:27Z

I've uploaded a demo with hardcoded inputs and outputs for the object detection widget here (until we figure out the pipeline):
https://6102a74c4d4db912930e6357--huggingface-widgets.netlify.app/
Please provide feedback on anything: interaction, colors, etc.

julien-c · 2021-07-29T14:03:18Z

my only feedback is that I...

LOVE IT 🔥

severo · 2021-07-30T07:09:47Z

Excellent!

Feedback: I'm wondering if the bars and the bounding boxes use the same base colors? Not sure, but it seems like the bounding boxes use the browser's base colors ('red', 'blue', etc) while the bars use the tailwindcss base colors.

mishig25 · 2021-07-30T07:46:46Z

@severo thats a great feedback! That's indeed how its currently done (see here and here): if boundingbox is red, then the bar/label is red-400. I'll update boundingboxes to use color-400 as well 👍

osanseviero · 2022-03-17T08:39:04Z

Since these pipelines and widgets are merged, I'll close this issue

julien-c added the widgets label Jun 4, 2021

mishig25 mentioned this issue Jul 26, 2021

Object detection pipeline huggingface/transformers#12886

Merged

5 tasks

mishig25 mentioned this issue Oct 1, 2021

Image Segmentation pipeline huggingface/transformers#13828

Merged

5 tasks

LysandreJik transferred this issue from huggingface/huggingface_hub Mar 16, 2022

osanseviero closed this as completed Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New object detection and image segmentation widgets #6

New object detection and image segmentation widgets #6

NielsRogge commented Jun 4, 2021 •

edited

Loading

julien-c commented Jun 4, 2021

NielsRogge commented Jun 4, 2021 •

edited

Loading

julien-c commented Jun 4, 2021

LysandreJik commented Jun 11, 2021

julien-c commented Jun 15, 2021

Narsil commented Jun 23, 2021

julien-c commented Jun 23, 2021

severo commented Jun 23, 2021

Narsil commented Jun 23, 2021

mishig25 commented Jun 23, 2021

mishig25 commented Jun 25, 2021 •

edited

Loading

severo commented Jun 25, 2021

julien-c commented Jun 25, 2021 •

edited

Loading

severo commented Jun 25, 2021

mishig25 commented Jun 26, 2021

mishig25 commented Jun 30, 2021 •

edited

Loading

osanseviero commented Jun 30, 2021

julien-c commented Jun 30, 2021

NielsRogge commented Jun 30, 2021

mishig25 commented Jul 7, 2021 •

edited

Loading

osanseviero commented Jul 7, 2021

mishig25 commented Jul 20, 2021 •

edited

Loading

julien-c commented Jul 20, 2021

NielsRogge commented Jul 21, 2021 •

edited

Loading

Narsil commented Jul 21, 2021

mishig25 commented Jul 22, 2021

julien-c commented Jul 22, 2021

mishig25 commented Jul 29, 2021

julien-c commented Jul 29, 2021

severo commented Jul 30, 2021 •

edited

Loading

mishig25 commented Jul 30, 2021

osanseviero commented Mar 17, 2022 •

edited

Loading

New object detection and image segmentation widgets #6

New object detection and image segmentation widgets #6

Comments

NielsRogge commented Jun 4, 2021 • edited Loading

julien-c commented Jun 4, 2021

NielsRogge commented Jun 4, 2021 • edited Loading

julien-c commented Jun 4, 2021

LysandreJik commented Jun 11, 2021

julien-c commented Jun 15, 2021

Narsil commented Jun 23, 2021

julien-c commented Jun 23, 2021

severo commented Jun 23, 2021

Narsil commented Jun 23, 2021

mishig25 commented Jun 23, 2021

mishig25 commented Jun 25, 2021 • edited Loading

severo commented Jun 25, 2021

julien-c commented Jun 25, 2021 • edited Loading

severo commented Jun 25, 2021

mishig25 commented Jun 26, 2021

mishig25 commented Jun 30, 2021 • edited Loading

osanseviero commented Jun 30, 2021

julien-c commented Jun 30, 2021

NielsRogge commented Jun 30, 2021

mishig25 commented Jul 7, 2021 • edited Loading

osanseviero commented Jul 7, 2021

mishig25 commented Jul 20, 2021 • edited Loading

Reasons to treat them as same pipelines:

Reasons to treat them as different pipelines:

julien-c commented Jul 20, 2021

NielsRogge commented Jul 21, 2021 • edited Loading

Narsil commented Jul 21, 2021

mishig25 commented Jul 22, 2021

julien-c commented Jul 22, 2021

mishig25 commented Jul 29, 2021

julien-c commented Jul 29, 2021

severo commented Jul 30, 2021 • edited Loading

mishig25 commented Jul 30, 2021

osanseviero commented Mar 17, 2022 • edited Loading

NielsRogge commented Jun 4, 2021 •

edited

Loading

NielsRogge commented Jun 4, 2021 •

edited

Loading

mishig25 commented Jun 25, 2021 •

edited

Loading

julien-c commented Jun 25, 2021 •

edited

Loading

mishig25 commented Jun 30, 2021 •

edited

Loading

mishig25 commented Jul 7, 2021 •

edited

Loading

mishig25 commented Jul 20, 2021 •

edited

Loading

NielsRogge commented Jul 21, 2021 •

edited

Loading

severo commented Jul 30, 2021 •

edited

Loading

osanseviero commented Mar 17, 2022 •

edited

Loading