Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image feature #3163

Merged
merged 69 commits into from
Dec 6, 2021
Merged
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
42956f1
Initial commit
mariosasko Oct 21, 2021
93440cb
Add basic decoding
mariosasko Oct 23, 2021
4baaae7
Replace features.Audio with Audio
mariosasko Oct 23, 2021
be25e24
Add Image to package reference
mariosasko Oct 24, 2021
4660ce8
Use np.array
mariosasko Oct 25, 2021
b80633a
Update error msg
mariosasko Oct 25, 2021
ac25393
Add mode and channel decoding
mariosasko Oct 26, 2021
9946969
Fix return value
mariosasko Oct 26, 2021
ff53e94
Finish decoding
mariosasko Oct 26, 2021
d1a91af
Make CI happy
mariosasko Oct 26, 2021
e628c9c
Some more fixes
mariosasko Oct 26, 2021
bec851f
Minor doc fix
mariosasko Oct 27, 2021
fab6e0f
Remove animated option
mariosasko Oct 28, 2021
539ffb5
Pin version
mariosasko Oct 28, 2021
924f94b
Remove unused imports in setup.py
mariosasko Oct 29, 2021
69eb82c
Add vision requirements to setup.py
mariosasko Oct 29, 2021
9a1a8ff
Add initial tests
mariosasko Oct 29, 2021
9d29246
Delete other formats
mariosasko Nov 3, 2021
69b2f90
Make Image feature hashable
mariosasko Nov 3, 2021
50bee0a
Add more tests
mariosasko Nov 3, 2021
4533c51
Support numpy array in alter data check in TypedSequence
mariosasko Nov 3, 2021
1eec068
Fix TypedSequence converion
mariosasko Nov 3, 2021
3323d41
Finish tests
mariosasko Nov 3, 2021
bcc4ed4
Merge conflicts
mariosasko Nov 3, 2021
db9a3eb
Update Image - add ImageExtensionType and supporting functions
mariosasko Nov 10, 2021
d8e3ace
Update encoding functions
mariosasko Nov 10, 2021
f252dcb
Add support in TypedSequence for ImageExtensionType
mariosasko Nov 10, 2021
c7b905d
Add tests
mariosasko Nov 10, 2021
d61e334
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 10, 2021
d4d413a
Remove unused import
mariosasko Nov 10, 2021
c095040
Fix doc and style
mariosasko Nov 10, 2021
a5c3d8e
Fix doc indentation
mariosasko Nov 11, 2021
eab101a
Improve comment
mariosasko Nov 12, 2021
8a5449a
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 12, 2021
87b2504
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 16, 2021
2d63808
Return single image instead of dict
mariosasko Nov 17, 2021
2be6d1f
Fix merge conflict
mariosasko Nov 17, 2021
c21efa2
Return PIL Image and not dict
mariosasko Nov 19, 2021
856d2bc
Encode dict
mariosasko Nov 19, 2021
0fa8a6f
Update tests
mariosasko Nov 19, 2021
462210f
Style
mariosasko Nov 19, 2021
6391d3c
np.ndarray encoding/decoding
mariosasko Nov 22, 2021
ce82bff
Minor improvements
mariosasko Nov 23, 2021
95828bf
PIL Image support in cast_to_python_objects
mariosasko Nov 23, 2021
64382a2
Test cast
mariosasko Nov 23, 2021
0152aab
Doc fix
mariosasko Nov 23, 2021
de91d49
Extension type fixes
mariosasko Nov 23, 2021
f488d06
Style
mariosasko Nov 23, 2021
efe96e9
Use types_mapper in Dataset.to_pandas
mariosasko Nov 24, 2021
8c0f364
Add pandas extension array for image type
mariosasko Nov 24, 2021
30f0eb7
Update tests
mariosasko Nov 24, 2021
5c059d9
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 24, 2021
8df9479
image type inference
lhoestq Nov 26, 2021
78436d2
Remvoe cast_to_python test after Quentin's change
mariosasko Nov 26, 2021
db5ec0d
Improve tests
mariosasko Nov 26, 2021
c737af7
Add storage type
mariosasko Nov 29, 2021
b0425ba
Improve tests
mariosasko Nov 29, 2021
768b9f1
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 29, 2021
544b093
Test map that returns np.ndarray
mariosasko Nov 30, 2021
606215d
Rename functions
mariosasko Nov 30, 2021
d7b204c
Add streaming test
mariosasko Nov 30, 2021
31f02ad
Use image struct in all situations
mariosasko Dec 1, 2021
6ee758e
Update src/datasets/features/image.py - encode_example type hint
mariosasko Dec 1, 2021
f294e5b
Update src/datasets/features/image.py -list_image_compression_formats…
mariosasko Dec 1, 2021
dff7b46
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Dec 2, 2021
25d8902
Support str in encode_objects_to_image_dicts
mariosasko Dec 2, 2021
a940a8f
Merge branch 'add-image-feature' of github.com:mariosasko/datasets-1 …
mariosasko Dec 2, 2021
1109699
Update src/datasets/features/image.py - objects_to_list_of_image_dict…
mariosasko Dec 2, 2021
79c87f8
Style
mariosasko Dec 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Make CI happy
  • Loading branch information
mariosasko committed Oct 26, 2021
commit d1a91af8e062a0fe11dbc5027071f0192ca33ec2
8 changes: 4 additions & 4 deletions datasets/ami/ami.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,31 +318,31 @@ def _info(self):

if self.config.name == "headset-single":
features_dict.update({"file": datasets.Value("string")})
features_dict.update({"audio": datasets.Audio(sampling_rate=16_000)})
features_dict.update({"audio": datasets.features.Audio(sampling_rate=16_000)})
config_description = (
"Close talking audio of single headset. "
"This configuration only includes audio belonging to the "
"headset of the person currently speaking."
)
elif self.config.name == "microphone-single":
features_dict.update({"file": datasets.Value("string")})
features_dict.update({"audio": datasets.Audio(sampling_rate=16_000)})
features_dict.update({"audio": datasets.features.Audio(sampling_rate=16_000)})
config_description = (
"Far field audio of single microphone. "
"This configuration only includes audio belonging the first microphone, "
"*i.e.* 1-1, of the microphone array."
)
elif self.config.name == "headset-multi":
features_dict.update({f"file-{i}": datasets.Value("string") for i in range(4)})
features_dict.update({f"file-{i}": datasets.Audio(sampling_rate=16_000) for i in range(4)})
features_dict.update({f"file-{i}": datasets.features.Audio(sampling_rate=16_000) for i in range(4)})
config_description = (
"Close talking audio of four individual headset. "
"This configuration includes audio belonging to four individual headsets."
" For each annotation there are 4 audio files 0, 1, 2, 3."
)
elif self.config.name == "microphone-multi":
features_dict.update({f"file-1-{i}": datasets.Value("string") for i in range(1, 8)})
features_dict.update({f"file-1-{i}": datasets.Audio(sampling_rate=16_000) for i in range(1, 8)})
features_dict.update({f"file-1-{i}": datasets.features.Audio(sampling_rate=16_000) for i in range(1, 8)})
config_description = (
"Far field audio of microphone array. "
"This configuration includes audio of "
Expand Down