-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Image feature #3163
Add Image feature #3163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for workin on this!! Looks great so far.
- I would prefer channels first by default.
- I don't think animated gifs make sense to go inside
Image
, but could be convinced otherwise - would be cool to apply Image feature to the datasets it makes sense to add to (beans, cats_vs_dogs, etc), but can also be done in separate PR later if that's easier to manage.
Awesome, looking forward to using it :) |
… add-image-feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks all good ! And thanks for also adding all those tests :)
I played with it and it works wonderfully
I added some comments, mostly to add some typing.
I also found one use case that we can fix later (because it also affects the Audio feature anyway): if you have a dataset with an Image or Audio type and you map with the identity function, then it's replaced with a string type.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
… type hint Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
I'm marking this PR as ready for review. Thanks to @sgugger's comment, the API is much more flexible now as it decodes images (lazily) as Also, we no longer return paths explicitly (previously, we would return
(Btw, for the images stored locally, you can access their paths with Examples of what you can do: # load local images
dset = Dataset.from_dict("image": [local_image_path], features=Features({"images": Image()}))
# load remote images (we got this for free by adding support for streaming)
dset = Dataset.from_dict("image": [image_url], features=Features({"images": Image()}))
# from np.ndarray
dset = Dataset.from_dict({"image": [np.array(...)]}, features=Features({"images": Image()}))
# cast column
dset = Dataset.from_dict({"image": [local_image_path]})
dset.cast_column("image", Image())
# automatic type inference
dset = Dataset.from_dict({"image": [PIL.Image.open(local_image_path)]})
# transforms
def img_transform(example):
...
example["image"] = transformed_pil_image_or_np_ndarray
return example
dset.map(img_trnasform)
# transform that adds a new column with images (automatic inference of the feature type)
dset.map(lambda ex: {"image_resized": ex["image"].resize((100, 100))})
print(dset.features["image_resized"]) # will print Image() Some more cool features:
Hints to make reviewing easier:
P.S. I'll fork the PR branch and start adding the Image feature to the existing image datasets (will also update the |
Yes definitely, also I think it could be useful for the dataset viewer to not decode the data but instead return either the bytes or the (possibly chained) URL. cc @severo |
… add-image-feature
…into add-image-feature
…s type hint Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
We want to merge this today/tomorrow, so I'd really appreciate your reviews @sgugger @nateraw. Also, you can test this feature on the existing image datasets (MNIST, beans, food101, ...) by installing
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the amazing work 💯 It's really nice to use !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work - thanks for battling the issues on this one.
As promised, I took a deeper dive.
Going through putting together that notebook, I observed a couple issues you may want to tackle later:
- Specifying features to Array3D didn't work for me when processing across the whole dataset. I think maybe I was doing something wrong, but can't tell. Take a look at the 2nd training example to see what I mean. left a comment in the code
- PIL dependency upgrade in colab is a bit annoying, as you have to restart the runtime to continue on. I noticed an error that I'm guessing you ran into when I tried using the older version...would be nice to not have to restart runtime whenever I use
datasets
with vision deps. Also, this could silently fail for folks trying to use it without installing the extras.
Outside of those 2 things, this LGTM 🚀
Thanks for the review @nateraw!
|
Thanks for playing with it @nateraw and for sharing your notebook, this is useful :) I think this is ready now, congrats @mariosasko ! |
Love this feature and hope to release soon! |
Adds the Image feature. This feature is heavily inspired by the recently added Audio feature (#2324). Currently, this PR is pretty simple.
Some considerations that need further discussion:
Pillow
/PIL
as the image decoding library. Another candidate I considered istorchvision
, mostly because of itsaccimage
backend, which should be faster for loadingjpeg
images thanPillow
. However,torchvision
's io module only supports png and jpeg images, hastorch
as a hard dependency, and requires magic to work with image bytes (torch.ByteTensor(torch.ByteStorage.from_buffer(image_bytes)))
).PIL
'sImage
type tonp.ndarray
. The vision models in Transformers such as ViT prefer the rawImage
type and not the decoded tensors, so there is a small overhead due to this conversion. IMO this is justified to keep this part aligned with the Audio feature, which also returnsnp.ndarray
. What do you think?torch
,tf
,jax
).Image.open
returns images of shape (H, W, C). However, ViT's feature extractor expects the format (C, H, W) if the image is passed as an array (explained here), so I'm more inclined to the format (C, H, W). Which one do you prefer, (C, H, W) or (H, W, C)?cast_column
, such assampling_rate
in the Audio feature)TODOs:
ArrayND
Colab Notebook where you can play with this feature.
I'm also adding a link to the Image feature in TFDS because one of our goals is to parse TFDS scripts eventually, so our Image feature has to (at least) support all the formats theirs does.
Feel free to cc anyone who might be interested.
P.S. Please ignore the changes in the
datasets/**/*.py
files 😄.