Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image feature #3163

Merged
merged 69 commits into from
Dec 6, 2021
Merged

Add Image feature #3163

merged 69 commits into from
Dec 6, 2021

Conversation

mariosasko
Copy link
Collaborator

@mariosasko mariosasko commented Oct 25, 2021

Adds the Image feature. This feature is heavily inspired by the recently added Audio feature (#2324). Currently, this PR is pretty simple.

Some considerations that need further discussion:

  • I've decided to use Pillow/PIL as the image decoding library. Another candidate I considered is torchvision, mostly because of its accimage backend, which should be faster for loading jpeg images than Pillow. However, torchvision's io module only supports png and jpeg images, has torch as a hard dependency, and requires magic to work with image bytes ( torch.ByteTensor(torch.ByteStorage.from_buffer(image_bytes)))).
  • Currently, I'm converting PIL's Image type to np.ndarray. The vision models in Transformers such as ViT prefer the raw Image type and not the decoded tensors, so there is a small overhead due to this conversion. IMO this is justified to keep this part aligned with the Audio feature, which also returns np.ndarray. What do you think?
  • Still have to work on the channel decoding logic:
    • PyTorch prefers the channel-first ordering (C, H, W); TF and Flax the channel-last ordering (H, W, C). One cool feature would be adjusting the channel order based on the selected formatter (torch, tf, jax).
    • By default, Image.open returns images of shape (H, W, C). However, ViT's feature extractor expects the format (C, H, W) if the image is passed as an array (explained here), so I'm more inclined to the format (C, H, W). Which one do you prefer, (C, H, W) or (H, W, C)?
  • Are there any options you'd like to see? (the user could change those via cast_column, such as sampling_rate in the Audio feature)

TODOs:

Colab Notebook where you can play with this feature.

I'm also adding a link to the Image feature in TFDS because one of our goals is to parse TFDS scripts eventually, so our Image feature has to (at least) support all the formats theirs does.
Feel free to cc anyone who might be interested.

P.S. Please ignore the changes in the datasets/**/*.py files 😄.

@mariosasko mariosasko changed the title Add image feature Add Image feature Oct 25, 2021
Copy link
Contributor

@nateraw nateraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for workin on this!! Looks great so far.

  • I would prefer channels first by default.
  • I don't think animated gifs make sense to go inside Image, but could be convinced otherwise
  • would be cool to apply Image feature to the datasets it makes sense to add to (beans, cats_vs_dogs, etc), but can also be done in separate PR later if that's easier to manage.

src/datasets/features/image.py Outdated Show resolved Hide resolved
src/datasets/features/image.py Outdated Show resolved Hide resolved
src/datasets/features/image.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
datasets/vivos/vivos.py Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Oct 29, 2021

Awesome, looking forward to using it :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks all good ! And thanks for also adding all those tests :)
I played with it and it works wonderfully

I added some comments, mostly to add some typing.

I also found one use case that we can fix later (because it also affects the Audio feature anyway): if you have a dataset with an Image or Audio type and you map with the identity function, then it's replaced with a string type.

src/datasets/features/image.py Outdated Show resolved Hide resolved
src/datasets/features/image.py Show resolved Hide resolved
src/datasets/features/image.py Outdated Show resolved Hide resolved
src/datasets/features/image.py Show resolved Hide resolved
src/datasets/features/image.py Outdated Show resolved Hide resolved
mariosasko and others added 2 commits December 1, 2021 12:50
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
… type hint

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
@mariosasko
Copy link
Collaborator Author

I'm marking this PR as ready for review.

Thanks to @sgugger's comment, the API is much more flexible now as it decodes images (lazily) as PIL.Image.Image objects and supports transforms directly on them.

Also, we no longer return paths explicitly (previously, we would return {"path": image_path, "image": pil_image}) for the following reasons:

  • what to return when reading an image from an URL or a NumPy array. We could set path to None in these situations, but IMO we should avoid redundant information.
  • returning a dict doesn't match nicely with the requirement of supporting image modifications - what to do if the user modifies both the image path and the image

(Btw, for the images stored locally, you can access their paths with dset[idx]["image"].filename, or by avoiding decoding with paths = [ex["path"] for ex in dset]. @lhoestq @albertvillanova WDYT about having an option to skip decoding for complex features, e. g. Audio(decode=False)? This way, the user can easily access the underlying data.)

Examples of what you can do:

# load local images
dset = Dataset.from_dict("image": [local_image_path], features=Features({"images": Image()}))
# load remote images (we got this for free by adding support for streaming)
dset = Dataset.from_dict("image": [image_url], features=Features({"images": Image()}))
# from np.ndarray
dset = Dataset.from_dict({"image": [np.array(...)]}, features=Features({"images": Image()}))
# cast column
dset = Dataset.from_dict({"image": [local_image_path]})
dset.cast_column("image", Image())

# automatic type inference
dset = Dataset.from_dict({"image": [PIL.Image.open(local_image_path)]})

# transforms
def img_transform(example):
     ...
     example["image"] = transformed_pil_image_or_np_ndarray
     return example
dset.map(img_trnasform)

# transform that adds a new column with images (automatic inference of the feature type)
dset.map(lambda ex: {"image_resized": ex["image"].resize((100, 100))})
print(dset.features["image_resized"]) # will print Image()

Some more cool features:

  • We store the image filename (pil_image.filename) whenever possible to avoid costly conversion to bytes
  • if possible, we use native compression when encoding images. Otherwise, we fall back to the lossless PNG format (e.g. after image ops or when storing NumPy arrays)

Hints to make reviewing easier:

  • feel free to ignore the extension type part because it's related to PyArrow internals.
  • also, let me know if we are too strict/ too flexible in terms of types the Image feature can encode/decode. Hints:
    • encode_example handles encoding during dataset generation (you can think of it as yield key, features.encode_example(example))
    • objects_to_list_of_image_dicts handles encoding of returned examples in map

P.S. I'll fork the PR branch and start adding the Image feature to the existing image datasets (will also update the ImageClassification template while doing that).

@mariosasko mariosasko marked this pull request as ready for review December 1, 2021 14:32
@lhoestq
Copy link
Member

lhoestq commented Dec 2, 2021

WDYT about having an option to skip decoding for complex features, e. g. Audio(decode=False)?

Yes definitely, also I think it could be useful for the dataset viewer to not decode the data but instead return either the bytes or the (possibly chained) URL. cc @severo

@mariosasko
Copy link
Collaborator Author

We want to merge this today/tomorrow, so I'd really appreciate your reviews @sgugger @nateraw.

Also, you can test this feature on the existing image datasets (MNIST, beans, food101, ...) by installing datasets from the PR branch:

pip install git+https://github.com/huggingface/datasets.git@adapt-image-datasets

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the amazing work 💯 It's really nice to use !

Copy link
Contributor

@nateraw nateraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work - thanks for battling the issues on this one.

As promised, I took a deeper dive.

Open In Colab

Going through putting together that notebook, I observed a couple issues you may want to tackle later:

  1. Specifying features to Array3D didn't work for me when processing across the whole dataset. I think maybe I was doing something wrong, but can't tell. Take a look at the 2nd training example to see what I mean. left a comment in the code
  2. PIL dependency upgrade in colab is a bit annoying, as you have to restart the runtime to continue on. I noticed an error that I'm guessing you ran into when I tried using the older version...would be nice to not have to restart runtime whenever I use datasets with vision deps. Also, this could silently fail for folks trying to use it without installing the extras.

Outside of those 2 things, this LGTM 🚀

@mariosasko
Copy link
Collaborator Author

Thanks for the review @nateraw!

  1. This is a copy of your notebook with the fixed map call: https://colab.research.google.com/gist/mariosasko/e351a717682a9392ca03908e65a2600e/image-feature-demo.ipynb
    (Sorry for misleading you with the map call in my un-updated notebook)
    Also, we can avoid this cast by trying to infer the type of the column ("pixel_values") returned by the image feature extractor (we are already doing something similar for the columns with names: "attention_mask", "input_ids", ...). I plan to add this QOL improvement soon.

  2. It should work OK even without updating Pillow and PyArrow (these two libraries are pre-installed in Colab, so updating them requires a restart of the runtime).

    I noticed an error that I'm guessing you ran into when I tried using the older version

    Do you recall which type of error it was because everything works fine on my side if I run the notebooks with the lowest supported version of Pillow (6.2.1)?

@lhoestq
Copy link
Member

lhoestq commented Dec 6, 2021

Thanks for playing with it @nateraw and for sharing your notebook, this is useful :)

I think this is ready now, congrats @mariosasko !

@lhoestq lhoestq merged commit 76bb459 into huggingface:master Dec 6, 2021
@mariosasko mariosasko deleted the add-image-feature branch December 6, 2021 18:35
@tshu-w
Copy link

tshu-w commented Dec 21, 2021

Love this feature and hope to release soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants