-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New object detection and image segmentation widgets #6
Comments
really cool! so i think the steps are:
|
Ok, I think it might make sense to divide the API into object detection, semantic segmentation, instance segmentation, and panoptic segmentation. This blog post explains the difference between semantic/instance/panoptic segmentation well. The input/output of the various tasks is as follows:
|
Agreed. Though we'll probably group all of those in a generic image-segmentation on the hub side for ease-of-use/accessibility |
We just open sourced widgets in huggingface/huggingface_hub#87 if you want to take a look @NielsRogge! We'll write a document on how to get started but feel free to try it out locally! |
maybe @mishig25 can take a look at the widget side of this! |
Proposed PR on @mishig25 I think image manipulation will be a bit tricky to do client side, hence I propose a "reduction" mecanism for the actual API (that goes on top of the pipeline or within the pipeline) to simply output 1 image, what do you think? If we kept the actual masks, we could do improved UX with maybe mouse hover effects and so on, but I am not 100% sure of how easy to do in JS (doing it in python is somewhat trivial maybe half a day to fix all the odd issues like label placement and so on) |
tagging @gary149 and @severo as well but I think client-side rendering can/will be way cooler (interactivity as you mention, quality of the UX, etc) Also is someone ends up calling the API outside of the widget (like in an actual programmatic use case I don't think they will want the rendered output) |
I love doing this kind of processing in JS |
Yes API-wise we need to be able to support raw masks for sure ! |
@severo @julien-c please let me know if there's anything particular you'd like me to work on. Otherwise, I can start digging more into |
Assuming the API output will be [
{
"mask": // Array<Array<Bool>> 2D array of bool,
"score": // float,
"label": // str,
},
// ...
] In terms of visualizing masks, which option would you suggest:
Unless there is an objection of using |
I think it's the best approach too |
i'm not a pro at frontend drawing technologies, what are the Pros & Cons of svg vs. canvas? what about WebGL, maybe via (just out of curiosity!) |
SVG is really done for vectorial drawings. We can incorporate bitmap images in it, but it's not really natural and you have to generate the images by the way (using canvas) |
Widget-wise, |
I'll create a draft PR once I refactor the code. Currently, it supports highlight on |
😮 |
looks really cool! |
Wow, super cool!! Really nice to see the cats picture still going strong haha (it's part of COCO evaluation, they are not my cats sadly). I should add the |
Pushed updates in branches widget-image-segmentation & widget-object-detection Updates:
[
{
"boundingBox": // Array<{x: number, y: number}> 2D array of 4 corrner vertixes of the bounding box,
"score": // float,
"label": // str,
},
// ...
]
Screenshots:
|
FYI @nateraw |
@Narsil and I are discussing whether Reasons to treat them as same pipelines:
Reasons to treat them as different pipelines:
Please let us know which option you would prefer and why @julien-c @NielsRogge @osanseviero @severo and we can reach a consensus |
I would treat them as different pipelines/widgets, I think it's clearer. but it depends on whether a single model/checkpoint can output both representations at the same time, in one model forward pass? (my understanding was no) |
I would also treat them differently, as they are quite separate tasks. Object detection is fairly simple: given an image, predict class labels + corresponding bounding boxes. However, image segmentation has different subtasks: I wonder whether all of these can be supported by a general image segmentation pipeline, or whether we should create one for every subtask. I also am wondering about the names of the head models: for now I have called DETR's panoptic segmentation model DetrForSegmentation, but it might be more appropriate to call it DetrForImageSegmentation (if we join all subtasks into one) or DetrForPanopticSegmentation (if we do decide to split up the different subtasks). Currently I'm working on another model, SegFormer, which is a semantic segmentation model, which predicts a label per pixel. So also here, current wondering how to call the head model: SegFormerForImageSegmentation, or SegFormerForSemanticSegmentation? Image segmentation seems to take all kinds of exotic forms, for example last week a paper by Facebook AI came out called "Per-Pixel Classification is Not All You Need for Semantic Segmentation". So even for semantic segmentation, there are different ways to solve the problem. Edit: reading the abstract, it seems fairly simple, they predict a binary mask per label rather than doing per-pixel classification. Curious about hearing your thoughts about all of this. I guess I should do a deep dive into image segmentation, because I'm coming from NLP. |
Hi, Bounding boxes is really the same as segmentation to me, it's just that the output can be simplified as squares. Image segmentation is really multi classification PER pixel, so a general list of masks+labels should cover all the potential needs (every mask can hit a single pixel multiple times). (It's equivalent as a list of class per-pixel). For instance-aware + part-aware, one, simply needs to add some form of dependencies between the parts (everything is most likely a tree, so a simple "parent" link should cover all cases there). |
Aggregating all the opinions expressed above, could I preliminarily conclude that:
One possibility is to have a one pipeline (for image seg & object detection), but have 2 different widgets for visualizing mask vs box. Also, do we have to consider size differences in treating object detection outputs as Please let me know any thoughts 👍 |
Yes, let's do two distinct widgets on the frontend side, and I would also tend to do two distinct pipelines 👍 |
I've uploaded a demo with hardcoded inputs and outputs for the object detection widget here (until we figure out the pipeline): |
my only feedback is that I... LOVE IT 🔥 |
Excellent! Feedback: I'm wondering if the bars and the bounding boxes use the same base colors? Not sure, but it seems like the bounding boxes use the browser's base colors ('red', 'blue', etc) while the bars use the tailwindcss base colors. |
Since these pipelines and widgets are merged, I'll close this issue |
For the DETR model, which will soon be part of HuggingFace Transformers (see huggingface/transformers#11653 (comment)), it would be cool to have object detection and image segmentation (actually panoptic segmentation) inference widgets.
Similar to the image classification widget, a user should be able to upload/drag an image to the widget, which is then annotated with bounding boxes and classes (in case of object detection), or turned into a segmentation map (in case of panoptic segmentation).
Here are 2 notebooks which illustrate what you can do with the head models of DETR:
DetrForObjectDetection
: https://colab.research.google.com/drive/170dlGN5s37uaYO32XKUHfPklGS8oB059?usp=sharingDetrForSegmentation
: https://colab.research.google.com/drive/1hTGTPGBLPRY1QkLmG7P9air6v04tcXUL?usp=sharingThe models are already on the hub: https://huggingface.co/models?search=facebook/detr
cc @LysandreJik
The text was updated successfully, but these errors were encountered: