Add Image feature #3163

mariosasko · 2021-10-25T19:07:48Z

Adds the Image feature. This feature is heavily inspired by the recently added Audio feature (#2324). Currently, this PR is pretty simple.

Some considerations that need further discussion:

I've decided to use Pillow/PIL as the image decoding library. Another candidate I considered is torchvision, mostly because of its accimage backend, which should be faster for loading jpeg images than Pillow. However, torchvision's io module only supports png and jpeg images, has torch as a hard dependency, and requires magic to work with image bytes ( torch.ByteTensor(torch.ByteStorage.from_buffer(image_bytes)))).
Currently, I'm converting PIL's Image type to np.ndarray. The vision models in Transformers such as ViT prefer the raw Image type and not the decoded tensors, so there is a small overhead due to this conversion. IMO this is justified to keep this part aligned with the Audio feature, which also returns np.ndarray. What do you think?
Still have to work on the channel decoding logic:
- PyTorch prefers the channel-first ordering (C, H, W); TF and Flax the channel-last ordering (H, W, C). One cool feature would be adjusting the channel order based on the selected formatter (torch, tf, jax).
- By default, Image.open returns images of shape (H, W, C). However, ViT's feature extractor expects the format (C, H, W) if the image is passed as an array (explained here), so I'm more inclined to the format (C, H, W). Which one do you prefer, (C, H, W) or (H, W, C)?
Are there any options you'd like to see? (the user could change those via cast_column, such as sampling_rate in the Audio feature)

TODOs:

tests
in subsequent PRs:
- docs - a section in the docs, which gives some additional info on the Image and Audio feature and compares them to
  ArrayND
- streaming (waiting for Support Audio feature for TAR archives in sequential access #3129 and Support Audio feature in streaming mode #3133 to get merged first)
- update the image tasks and the datasets to use the new feature
- Image/Audio formatting

Colab Notebook where you can play with this feature.

I'm also adding a link to the Image feature in TFDS because one of our goals is to parse TFDS scripts eventually, so our Image feature has to (at least) support all the formats theirs does.
Feel free to cc anyone who might be interested.

P.S. Please ignore the changes in the datasets/**/*.py files 😄.

nateraw

Thanks a lot for workin on this!! Looks great so far.

I would prefer channels first by default.
I don't think animated gifs make sense to go inside Image, but could be convinced otherwise
would be cool to apply Image feature to the datasets it makes sense to add to (beans, cats_vs_dogs, etc), but can also be done in separate PR later if that's easier to manage.

src/datasets/features/image.py

setup.py

datasets/vivos/vivos.py

lhoestq · 2021-10-29T09:54:21Z

Awesome, looking forward to using it :)

… add-image-feature

lhoestq

It looks all good ! And thanks for also adding all those tests :)
I played with it and it works wonderfully

I added some comments, mostly to add some typing.

I also found one use case that we can fix later (because it also affects the Audio feature anyway): if you have a dataset with an Image or Audio type and you map with the identity function, then it's replaced with a string type.

src/datasets/features/image.py

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

… type hint Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

mariosasko · 2021-12-01T14:32:36Z

I'm marking this PR as ready for review.

Thanks to @sgugger's comment, the API is much more flexible now as it decodes images (lazily) as PIL.Image.Image objects and supports transforms directly on them.

Also, we no longer return paths explicitly (previously, we would return {"path": image_path, "image": pil_image}) for the following reasons:

what to return when reading an image from an URL or a NumPy array. We could set path to None in these situations, but IMO we should avoid redundant information.
returning a dict doesn't match nicely with the requirement of supporting image modifications - what to do if the user modifies both the image path and the image

(Btw, for the images stored locally, you can access their paths with dset[idx]["image"].filename, or by avoiding decoding with paths = [ex["path"] for ex in dset]. @lhoestq @albertvillanova WDYT about having an option to skip decoding for complex features, e. g. Audio(decode=False)? This way, the user can easily access the underlying data.)

Examples of what you can do:

# load local images
dset = Dataset.from_dict("image": [local_image_path], features=Features({"images": Image()}))
# load remote images (we got this for free by adding support for streaming)
dset = Dataset.from_dict("image": [image_url], features=Features({"images": Image()}))
# from np.ndarray
dset = Dataset.from_dict({"image": [np.array(...)]}, features=Features({"images": Image()}))
# cast column
dset = Dataset.from_dict({"image": [local_image_path]})
dset.cast_column("image", Image())

# automatic type inference
dset = Dataset.from_dict({"image": [PIL.Image.open(local_image_path)]})

# transforms
def img_transform(example):
     ...
     example["image"] = transformed_pil_image_or_np_ndarray
     return example
dset.map(img_trnasform)

# transform that adds a new column with images (automatic inference of the feature type)
dset.map(lambda ex: {"image_resized": ex["image"].resize((100, 100))})
print(dset.features["image_resized"]) # will print Image()

Some more cool features:

We store the image filename (pil_image.filename) whenever possible to avoid costly conversion to bytes
if possible, we use native compression when encoding images. Otherwise, we fall back to the lossless PNG format (e.g. after image ops or when storing NumPy arrays)

Hints to make reviewing easier:

feel free to ignore the extension type part because it's related to PyArrow internals.
also, let me know if we are too strict/ too flexible in terms of types the Image feature can encode/decode. Hints:
- encode_example handles encoding during dataset generation (you can think of it as yield key, features.encode_example(example))
- objects_to_list_of_image_dicts handles encoding of returned examples in map

P.S. I'll fork the PR branch and start adding the Image feature to the existing image datasets (will also update the ImageClassification template while doing that).

lhoestq · 2021-12-02T11:04:50Z

WDYT about having an option to skip decoding for complex features, e. g. Audio(decode=False)?

Yes definitely, also I think it could be useful for the dataset viewer to not decode the data but instead return either the bytes or the (possibly chained) URL. cc @severo

… add-image-feature

…into add-image-feature

…s type hint Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

mariosasko · 2021-12-02T15:51:40Z

We want to merge this today/tomorrow, so I'd really appreciate your reviews @sgugger @nateraw.

Also, you can test this feature on the existing image datasets (MNIST, beans, food101, ...) by installing datasets from the PR branch:

pip install git+https://github.com/huggingface/datasets.git@adapt-image-datasets

lhoestq

Thanks for the amazing work 💯 It's really nice to use !

nateraw

Really great work - thanks for battling the issues on this one.

As promised, I took a deeper dive.

Going through putting together that notebook, I observed a couple issues you may want to tackle later:

Specifying features to Array3D didn't work for me when processing across the whole dataset. I think maybe I was doing something wrong, but can't tell. Take a look at the 2nd training example to see what I mean. left a comment in the code
PIL dependency upgrade in colab is a bit annoying, as you have to restart the runtime to continue on. I noticed an error that I'm guessing you ran into when I tried using the older version...would be nice to not have to restart runtime whenever I use datasets with vision deps. Also, this could silently fail for folks trying to use it without installing the extras.

Outside of those 2 things, this LGTM 🚀

mariosasko · 2021-12-05T16:20:55Z

Thanks for the review @nateraw!

This is a copy of your notebook with the fixed map call: https://colab.research.google.com/gist/mariosasko/e351a717682a9392ca03908e65a2600e/image-feature-demo.ipynb
(Sorry for misleading you with the map call in my un-updated notebook)
Also, we can avoid this cast by trying to infer the type of the column ("pixel_values") returned by the image feature extractor (we are already doing something similar for the columns with names: "attention_mask", "input_ids", ...). I plan to add this QOL improvement soon.
It should work OK even without updating Pillow and PyArrow (these two libraries are pre-installed in Colab, so updating them requires a restart of the runtime).

I noticed an error that I'm guessing you ran into when I tried using the older version

Do you recall which type of error it was because everything works fine on my side if I run the notebooks with the lowest supported version of Pillow (6.2.1)?

lhoestq · 2021-12-06T17:48:52Z

Thanks for playing with it @nateraw and for sharing your notebook, this is useful :)

I think this is ready now, congrats @mariosasko !

tshu-w · 2021-12-21T01:51:36Z

Love this feature and hope to release soon!

mariosasko added 6 commits October 24, 2021 01:45

Initial commit

42956f1

Add basic decoding

93440cb

Replace features.Audio with Audio

4baaae7

Add Image to package reference

be25e24

Use np.array

4660ce8

Update error msg

b80633a

mariosasko requested review from albertvillanova, nateraw and lhoestq October 25, 2021 19:07

mariosasko changed the title ~~Add image feature~~ Add Image feature Oct 25, 2021

mariosasko added 2 commits October 26, 2021 16:49

Add mode and channel decoding

ac25393

Fix return value

9946969

mariosasko mentioned this pull request Oct 26, 2021

[when Image type will exist] provide a way to get the data as binary + filename #3145

Closed

mariosasko added 4 commits October 27, 2021 00:15

Finish decoding

ff53e94

Make CI happy

d1a91af

Some more fixes

e628c9c

Minor doc fix

bec851f

nateraw reviewed Oct 28, 2021

View reviewed changes

mariosasko mentioned this pull request Oct 28, 2021

Support Audio feature for TAR archives in sequential access #3129

Merged

mariosasko added 3 commits October 28, 2021 19:12

Remove animated option

fab6e0f

Pin version

539ffb5

Remove unused imports in setup.py

924f94b

mariosasko added 7 commits October 29, 2021 16:55

Add vision requirements to setup.py

69eb82c

Add initial tests

9a1a8ff

Delete other formats

9d29246

Make Image feature hashable

69b2f90

Add more tests

50bee0a

Support numpy array in alter data check in TypedSequence

4533c51

Fix TypedSequence converion

1eec068

mariosasko added 8 commits November 26, 2021 16:29

Improve tests

db5ec0d

Add storage type

c737af7

Improve tests

b0425ba

Merge branch 'master' of https://github.com/huggingface/datasets into…

768b9f1

… add-image-feature

Test map that returns np.ndarray

544b093

Rename functions

606215d

Add streaming test

d7b204c

Use image struct in all situations

31f02ad

lhoestq reviewed Dec 1, 2021

View reviewed changes

mariosasko and others added 2 commits December 1, 2021 12:50

Update src/datasets/features/image.py - encode_example type hint

6ee758e

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

Update src/datasets/features/image.py -list_image_compression_formats…

f294e5b

… type hint Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

mariosasko marked this pull request as ready for review December 1, 2021 14:32

mariosasko and others added 5 commits December 2, 2021 12:58

Merge branch 'master' of https://github.com/huggingface/datasets into…

dff7b46

… add-image-feature

Support str in encode_objects_to_image_dicts

25d8902

Merge branch 'add-image-feature' of github.com:mariosasko/datasets-1 …

a940a8f

…into add-image-feature

Update src/datasets/features/image.py - objects_to_list_of_image_dict…

1109699

…s type hint Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

Style

79c87f8

lhoestq approved these changes Dec 2, 2021

View reviewed changes

nateraw approved these changes Dec 4, 2021

View reviewed changes

This was referenced Dec 6, 2021

Impove the heuristic to detect image columns huggingface/dataset-viewer#79

Closed

Increase the proportion of hf.co datasets that can be previewed huggingface/dataset-viewer#9

Closed

mariosasko mentioned this pull request Dec 6, 2021

Adapt image datasets #3362

Merged

lhoestq merged commit 76bb459 into huggingface:master Dec 6, 2021

mariosasko deleted the add-image-feature branch December 6, 2021 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Image feature #3163

Add Image feature #3163

mariosasko commented Oct 25, 2021 •

edited

Loading

nateraw left a comment

lhoestq commented Oct 29, 2021

lhoestq left a comment

mariosasko commented Dec 1, 2021

lhoestq commented Dec 2, 2021

mariosasko commented Dec 2, 2021

lhoestq left a comment •

edited

Loading

nateraw left a comment

mariosasko commented Dec 5, 2021

lhoestq commented Dec 6, 2021

tshu-w commented Dec 21, 2021

Add Image feature #3163

Add Image feature #3163

Conversation

mariosasko commented Oct 25, 2021 • edited Loading

nateraw left a comment

Choose a reason for hiding this comment

lhoestq commented Oct 29, 2021

lhoestq left a comment

Choose a reason for hiding this comment

mariosasko commented Dec 1, 2021

lhoestq commented Dec 2, 2021

mariosasko commented Dec 2, 2021

lhoestq left a comment • edited Loading

Choose a reason for hiding this comment

nateraw left a comment

Choose a reason for hiding this comment

mariosasko commented Dec 5, 2021

lhoestq commented Dec 6, 2021

tshu-w commented Dec 21, 2021

mariosasko commented Oct 25, 2021 •

edited

Loading

lhoestq left a comment •

edited

Loading