Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image feature #3163

Merged
merged 69 commits into from
Dec 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
42956f1
Initial commit
mariosasko Oct 21, 2021
93440cb
Add basic decoding
mariosasko Oct 23, 2021
4baaae7
Replace features.Audio with Audio
mariosasko Oct 23, 2021
be25e24
Add Image to package reference
mariosasko Oct 24, 2021
4660ce8
Use np.array
mariosasko Oct 25, 2021
b80633a
Update error msg
mariosasko Oct 25, 2021
ac25393
Add mode and channel decoding
mariosasko Oct 26, 2021
9946969
Fix return value
mariosasko Oct 26, 2021
ff53e94
Finish decoding
mariosasko Oct 26, 2021
d1a91af
Make CI happy
mariosasko Oct 26, 2021
e628c9c
Some more fixes
mariosasko Oct 26, 2021
bec851f
Minor doc fix
mariosasko Oct 27, 2021
fab6e0f
Remove animated option
mariosasko Oct 28, 2021
539ffb5
Pin version
mariosasko Oct 28, 2021
924f94b
Remove unused imports in setup.py
mariosasko Oct 29, 2021
69eb82c
Add vision requirements to setup.py
mariosasko Oct 29, 2021
9a1a8ff
Add initial tests
mariosasko Oct 29, 2021
9d29246
Delete other formats
mariosasko Nov 3, 2021
69b2f90
Make Image feature hashable
mariosasko Nov 3, 2021
50bee0a
Add more tests
mariosasko Nov 3, 2021
4533c51
Support numpy array in alter data check in TypedSequence
mariosasko Nov 3, 2021
1eec068
Fix TypedSequence converion
mariosasko Nov 3, 2021
3323d41
Finish tests
mariosasko Nov 3, 2021
bcc4ed4
Merge conflicts
mariosasko Nov 3, 2021
db9a3eb
Update Image - add ImageExtensionType and supporting functions
mariosasko Nov 10, 2021
d8e3ace
Update encoding functions
mariosasko Nov 10, 2021
f252dcb
Add support in TypedSequence for ImageExtensionType
mariosasko Nov 10, 2021
c7b905d
Add tests
mariosasko Nov 10, 2021
d61e334
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 10, 2021
d4d413a
Remove unused import
mariosasko Nov 10, 2021
c095040
Fix doc and style
mariosasko Nov 10, 2021
a5c3d8e
Fix doc indentation
mariosasko Nov 11, 2021
eab101a
Improve comment
mariosasko Nov 12, 2021
8a5449a
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 12, 2021
87b2504
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 16, 2021
2d63808
Return single image instead of dict
mariosasko Nov 17, 2021
2be6d1f
Fix merge conflict
mariosasko Nov 17, 2021
c21efa2
Return PIL Image and not dict
mariosasko Nov 19, 2021
856d2bc
Encode dict
mariosasko Nov 19, 2021
0fa8a6f
Update tests
mariosasko Nov 19, 2021
462210f
Style
mariosasko Nov 19, 2021
6391d3c
np.ndarray encoding/decoding
mariosasko Nov 22, 2021
ce82bff
Minor improvements
mariosasko Nov 23, 2021
95828bf
PIL Image support in cast_to_python_objects
mariosasko Nov 23, 2021
64382a2
Test cast
mariosasko Nov 23, 2021
0152aab
Doc fix
mariosasko Nov 23, 2021
de91d49
Extension type fixes
mariosasko Nov 23, 2021
f488d06
Style
mariosasko Nov 23, 2021
efe96e9
Use types_mapper in Dataset.to_pandas
mariosasko Nov 24, 2021
8c0f364
Add pandas extension array for image type
mariosasko Nov 24, 2021
30f0eb7
Update tests
mariosasko Nov 24, 2021
5c059d9
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 24, 2021
8df9479
image type inference
lhoestq Nov 26, 2021
78436d2
Remvoe cast_to_python test after Quentin's change
mariosasko Nov 26, 2021
db5ec0d
Improve tests
mariosasko Nov 26, 2021
c737af7
Add storage type
mariosasko Nov 29, 2021
b0425ba
Improve tests
mariosasko Nov 29, 2021
768b9f1
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Nov 29, 2021
544b093
Test map that returns np.ndarray
mariosasko Nov 30, 2021
606215d
Rename functions
mariosasko Nov 30, 2021
d7b204c
Add streaming test
mariosasko Nov 30, 2021
31f02ad
Use image struct in all situations
mariosasko Dec 1, 2021
6ee758e
Update src/datasets/features/image.py - encode_example type hint
mariosasko Dec 1, 2021
f294e5b
Update src/datasets/features/image.py -list_image_compression_formats…
mariosasko Dec 1, 2021
dff7b46
Merge branch 'master' of https://github.com/huggingface/datasets into…
mariosasko Dec 2, 2021
25d8902
Support str in encode_objects_to_image_dicts
mariosasko Dec 2, 2021
a940a8f
Merge branch 'add-image-feature' of github.com:mariosasko/datasets-1 …
mariosasko Dec 2, 2021
1109699
Update src/datasets/features/image.py - objects_to_list_of_image_dict…
mariosasko Dec 2, 2021
79c87f8
Style
mariosasko Dec 2, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datasets/arabic_speech_corpus/arabic_speech_corpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def _info(self):
{
"file": datasets.Value("string"),
"text": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=48_000),
"audio": datasets.Audio(sampling_rate=48_000),
"phonetic": datasets.Value("string"),
"orthographic": datasets.Value("string"),
}
Expand Down
2 changes: 1 addition & 1 deletion datasets/common_language/common_language.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def _info(self):
{
"client_id": datasets.Value("string"),
"path": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=48_000),
"audio": datasets.Audio(sampling_rate=48_000),
"sentence": datasets.Value("string"),
"age": datasets.Value("string"),
"gender": datasets.Value("string"),
Expand Down
2 changes: 1 addition & 1 deletion datasets/common_voice/common_voice.py
Original file line number Diff line number Diff line change
Expand Up @@ -631,7 +631,7 @@ def _info(self):
{
"client_id": datasets.Value("string"),
"path": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=48_000),
"audio": datasets.Audio(sampling_rate=48_000),
"sentence": datasets.Value("string"),
"up_votes": datasets.Value("int64"),
"down_votes": datasets.Value("int64"),
Expand Down
2 changes: 1 addition & 1 deletion datasets/covost2/covost2.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def _info(self):
features=datasets.Features(
client_id=datasets.Value("string"),
file=datasets.Value("string"),
audio=datasets.features.Audio(sampling_rate=16_000),
audio=datasets.Audio(sampling_rate=16_000),
sentence=datasets.Value("string"),
translation=datasets.Value("string"),
id=datasets.Value("string"),
Expand Down
2 changes: 1 addition & 1 deletion datasets/librispeech_asr/librispeech_asr.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def _info(self):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"text": datasets.Value("string"),
"speaker_id": datasets.Value("int64"),
"chapter_id": datasets.Value("int64"),
Expand Down
2 changes: 1 addition & 1 deletion datasets/lj_speech/lj_speech.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def _info(self):
features=datasets.Features(
{
"id": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=22050),
"audio": datasets.Audio(sampling_rate=22050),
"file": datasets.Value("string"),
"text": datasets.Value("string"),
"normalized_text": datasets.Value("string"),
Expand Down
2 changes: 1 addition & 1 deletion datasets/openslr/openslr.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,7 @@ def _info(self):
features = datasets.Features(
{
"path": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=48_000),
"audio": datasets.Audio(sampling_rate=48_000),
"sentence": datasets.Value("string"),
}
)
Expand Down
12 changes: 6 additions & 6 deletions datasets/superb/superb.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ class Superb(datasets.GeneratorBasedBuilder):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"text": datasets.Value("string"),
"speaker_id": datasets.Value("int64"),
"chapter_id": datasets.Value("int64"),
Expand All @@ -162,7 +162,7 @@ class Superb(datasets.GeneratorBasedBuilder):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"label": datasets.ClassLabel(
names=[
"yes",
Expand Down Expand Up @@ -196,7 +196,7 @@ class Superb(datasets.GeneratorBasedBuilder):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"speaker_id": datasets.Value("string"),
"text": datasets.Value("string"),
"action": datasets.ClassLabel(
Expand Down Expand Up @@ -238,7 +238,7 @@ class Superb(datasets.GeneratorBasedBuilder):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
# VoxCeleb1 contains 1251 speaker IDs in range ["id10001",..."id11251"]
"label": datasets.ClassLabel(names=[f"id{i + 10001}" for i in range(1251)]),
}
Expand All @@ -261,7 +261,7 @@ class Superb(datasets.GeneratorBasedBuilder):
{
"record_id": datasets.Value("string"),
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"start": datasets.Value("int64"),
"end": datasets.Value("int64"),
"speakers": [
Expand Down Expand Up @@ -289,7 +289,7 @@ class Superb(datasets.GeneratorBasedBuilder):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"label": datasets.ClassLabel(names=["neu", "hap", "ang", "sad"]),
}
),
Expand Down
2 changes: 1 addition & 1 deletion datasets/timit_asr/timit_asr.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def _info(self):
features=datasets.Features(
{
"file": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
"text": datasets.Value("string"),
"phonetic_detail": datasets.Sequence(
{
Expand Down
2 changes: 1 addition & 1 deletion datasets/vivos/vivos.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def _info(self):
{
"speaker_id": datasets.Value("string"),
"path": datasets.Value("string"),
"audio": datasets.features.Audio(sampling_rate=16_000),
"audio": datasets.Audio(sampling_rate=16_000),
mariosasko marked this conversation as resolved.
Show resolved Hide resolved
"sentence": datasets.Value("string"),
}
),
Expand Down
3 changes: 3 additions & 0 deletions docs/source/package_reference/main_classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,9 @@ Dictionary with split names as keys ('train', 'test' for example), and :obj:`dat
.. autoclass:: datasets.Audio
:members:

.. autoclass:: datasets.Image
:members:

``MetricInfo``
~~~~~~~~~~~~~~~~~~~~~

Expand Down
10 changes: 7 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,7 @@
Push the commit to remote: "git push origin master"
"""

import datetime
import itertools
import os
import sys

from setuptools import find_packages, setup

Expand Down Expand Up @@ -108,6 +105,10 @@
"librosa",
]

VISION_REQURE = [
"Pillow>=6.2.1",
]

BENCHMARKS_REQUIRE = [
"numpy==1.18.5",
"tensorflow==2.3.0",
Expand Down Expand Up @@ -167,6 +168,8 @@
"importlib_resources;python_version<'3.7'",
]

TESTS_REQUIRE.extend(VISION_REQURE)

if os.name != "nt":
# dependencies of unbabel-comet
# only test if not on windows since there're issues installing fairseq on windows
Expand All @@ -185,6 +188,7 @@

EXTRAS_REQUIRE = {
"audio": AUDIO_REQUIRE,
"vision": VISION_REQURE,
"apache-beam": ["apache-beam>=2.26.0"],
"tensorflow": ["tensorflow>=2.2.0,!=2.6.0,!=2.6.1"],
"tensorflow_gpu": ["tensorflow-gpu>=2.2.0,!=2.6.0,!=2.6.1"],
Expand Down
1 change: 1 addition & 0 deletions src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
Audio,
ClassLabel,
Features,
Image,
Sequence,
Translation,
TranslationVariableLanguages,
Expand Down
6 changes: 3 additions & 3 deletions src/datasets/arrow_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
from . import config, utils
from .arrow_reader import ArrowReader
from .arrow_writer import ArrowWriter, OptimizedTypedSequence
from .features import ClassLabel, Features, Sequence, Value, _ArrayXD
from .features import ClassLabel, Features, Sequence, Value, _ArrayXD, pandas_types_mapper
from .filesystems import extract_path_from_uri, is_remote_filesystem
from .fingerprint import (
fingerprint_transform,
Expand Down Expand Up @@ -3280,15 +3280,15 @@ def to_pandas(
table=self._data,
key=slice(0, len(self)),
indices=self._indices if self._indices is not None else None,
).to_pandas()
).to_pandas(types_mapper=pandas_types_mapper)
else:
batch_size = batch_size if batch_size else config.DEFAULT_MAX_BATCH_SIZE
return (
query_table(
table=self._data,
key=slice(offset, offset + batch_size),
indices=self._indices if self._indices is not None else None,
).to_pandas()
).to_pandas(types_mapper=pandas_types_mapper)
for offset in range(0, len(self), batch_size)
)

Expand Down
17 changes: 15 additions & 2 deletions src/datasets/arrow_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import json
import os
import socket
import sys
from dataclasses import asdict
from typing import Any, Dict, List, Optional, Tuple, Union

Expand All @@ -26,10 +27,12 @@
from . import config, utils
from .features import (
Features,
ImageExtensionType,
_ArrayXDExtensionType,
cast_to_python_objects,
list_of_np_array_to_pyarrow_listarray,
numpy_to_pyarrow_listarray,
objects_to_list_of_image_dicts,
)
from .info import DatasetInfo
from .keyhash import DuplicatedKeysError, KeyHasher
Expand All @@ -49,7 +52,7 @@ class TypedSequence:
More specifically it adds several features:
- Support extension types like ``datasets.features.Array2DExtensionType``:
By default pyarrow arrays don't return extension arrays. One has to call
``pa.ExtensionArray.from_storage(type, pa.array(data, type.storage_type_name))``
``pa.ExtensionArray.from_storage(type, pa.array(data, type.storage_type))``
in order to get an extension array.
- Support for ``try_type`` parameter that can be used instead of ``type``:
When an array is transformed, we like to keep the same type as before if possible.
Expand Down Expand Up @@ -93,6 +96,10 @@ def __init__(self, data, type=None, try_type=None, optimized_int_type=None):

def __arrow_array__(self, type=None):
"""This function is called when calling pa.array(typed_sequence)"""

if config.PIL_AVAILABLE and "PIL" in sys.modules:
import PIL.Image

if type is not None:
raise ValueError("TypedSequence is supposed to be used with pa.array(typed_sequence, type=None)")
trying_type = False
Expand All @@ -104,6 +111,9 @@ def __arrow_array__(self, type=None):
else:
type = self.type
trying_int_optimization = False
if type is None: # automatic type inference for custom objects
if config.PIL_AVAILABLE and "PIL" in sys.modules and isinstance(self.data[0], PIL.Image.Image):
type = ImageExtensionType()
try:
if isinstance(type, _ArrayXDExtensionType):
if isinstance(self.data, np.ndarray):
Expand All @@ -113,6 +123,9 @@ def __arrow_array__(self, type=None):
else:
storage = pa.array(self.data, type.storage_dtype)
out = pa.ExtensionArray.from_storage(type, storage)
elif isinstance(type, ImageExtensionType):
storage = pa.array(objects_to_list_of_image_dicts(self.data), type=type.storage_type)
out = pa.ExtensionArray.from_storage(type, storage)
elif isinstance(self.data, np.ndarray):
out = numpy_to_pyarrow_listarray(self.data)
if type is not None:
Expand All @@ -123,7 +136,7 @@ def __arrow_array__(self, type=None):
out = out.cast(type)
else:
out = pa.array(cast_to_python_objects(self.data, only_1d_for_numpy=True), type=type)
if trying_type:
if trying_type and not isinstance(type, ImageExtensionType):
is_equal = (
np.array_equal(np.array(out[0].as_py()), self.data[0])
if isinstance(self.data[0], np.ndarray)
Expand Down
4 changes: 4 additions & 0 deletions src/datasets/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@
logger.info("Disabling Apache Beam because USE_BEAM is set to False")


# Optional tools for feature decoding
PIL_AVAILABLE = importlib.util.find_spec("PIL") is not None


# Optional compression tools
RARFILE_AVAILABLE = importlib.util.find_spec("rarfile") is not None
ZSTANDARD_AVAILABLE = importlib.util.find_spec("zstandard") is not None
Expand Down
1 change: 1 addition & 0 deletions src/datasets/features/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@
_cast_to_python_objects,
_is_zero_copy_only,
)
from .image import Image, ImageExtensionType, objects_to_list_of_image_dicts
from .translation import Translation, TranslationVariableLanguages
22 changes: 19 additions & 3 deletions src/datasets/features/features.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@

from datasets import config, utils
from datasets.features.audio import Audio
from datasets.features.image import Image, ImageExtensionType, PandasImageExtensionDtype
from datasets.features.translation import Translation, TranslationVariableLanguages
from datasets.utils.logging import get_logger

Expand Down Expand Up @@ -175,6 +176,9 @@ def _cast_to_python_objects(obj: Any, only_1d_for_numpy: bool) -> Tuple[Any, boo
if config.JAX_AVAILABLE and "jax" in sys.modules:
import jax.numpy as jnp

if config.PIL_AVAILABLE and "PIL" in sys.modules:
import PIL.Image

if isinstance(obj, np.ndarray):
if not only_1d_for_numpy or obj.ndim == 1:
return obj, False
Expand All @@ -197,6 +201,11 @@ def _cast_to_python_objects(obj: Any, only_1d_for_numpy: bool) -> Tuple[Any, boo
return np.asarray(obj), True
else:
return [_cast_to_python_objects(x, only_1d_for_numpy=only_1d_for_numpy)[0] for x in np.asarray(obj)], True
elif config.PIL_AVAILABLE and "PIL" in sys.modules and isinstance(obj, PIL.Image.Image):
if not only_1d_for_numpy:
return obj, False
else:
return [_cast_to_python_objects(x, only_1d_for_numpy=only_1d_for_numpy)[0] for x in np.array(obj)], True
elif isinstance(obj, pd.Series):
return obj.values.tolist(), True
elif isinstance(obj, pd.DataFrame):
Expand Down Expand Up @@ -471,7 +480,7 @@ class PandasArrayExtensionDtype(PandasExtensionDtype):
def __init__(self, value_type: Union["PandasArrayExtensionDtype", np.dtype]):
self._value_type = value_type

def __from_arrow__(self, array):
def __from_arrow__(self, array: Union[pa.Array, pa.ChunkedArray]):
if array.type.shape[0] is None:
raise NotImplementedError(
"Dynamic first dimension is not supported for "
Expand Down Expand Up @@ -567,7 +576,7 @@ def __getitem__(self, item: Union[int, slice, np.ndarray]) -> Union[np.ndarray,
def take(
self, indices: Sequence_[int], allow_fill: bool = False, fill_value: bool = None
) -> "PandasArrayExtensionArray":
indices: np.ndarray = np.asarray(indices, dtype="int")
indices: np.ndarray = np.asarray(indices, dtype=np.int)
if allow_fill:
fill_value = (
self.dtype.na_value if fill_value is None else np.asarray(fill_value, dtype=self.dtype.value_type)
Expand Down Expand Up @@ -599,6 +608,8 @@ def __eq__(self, other) -> np.ndarray:
def pandas_types_mapper(dtype):
if isinstance(dtype, _ArrayXDExtensionType):
return PandasArrayExtensionDtype(dtype.value_type)
elif isinstance(dtype, ImageExtensionType):
return PandasImageExtensionDtype()


@dataclass
Expand Down Expand Up @@ -759,6 +770,7 @@ class Sequence:
Array4D,
Array5D,
Audio,
Image,
]


Expand Down Expand Up @@ -849,7 +861,7 @@ def encode_nested_example(schema, obj):
return list(obj)
# Object with special encoding:
# ClassLabel will convert from string to int, TranslationVariableLanguages does some checks
elif isinstance(schema, (Audio, ClassLabel, TranslationVariableLanguages, Value, _ArrayXD)):
elif isinstance(schema, (Audio, Image, ClassLabel, TranslationVariableLanguages, Value, _ArrayXD)):
return schema.encode_example(obj)
# Other object should be directly convertible to a native Arrow type (like Translation and Translation)
return obj
Expand Down Expand Up @@ -903,6 +915,8 @@ def generate_from_arrow_type(pa_type: pa.DataType) -> FeatureType:
elif isinstance(pa_type, _ArrayXDExtensionType):
array_feature = [None, None, Array2D, Array3D, Array4D, Array5D][pa_type.ndims]
return array_feature(shape=pa_type.shape, dtype=pa_type.value_type)
elif isinstance(pa_type, ImageExtensionType):
return Image()
elif isinstance(pa_type, pa.DictionaryType):
raise NotImplementedError # TODO(thom) this will need access to the dictionary as well (for labels). I.e. to the py_table
elif isinstance(pa_type, pa.DataType):
Expand Down Expand Up @@ -963,6 +977,8 @@ class Features(dict):
- a :class:`Array2D`, :class:`Array3D`, :class:`Array4D` or :class:`Array5D` feature for multidimensional arrays
- an :class:`Audio` feature to store the absolute path to an audio file or a dictionary with the relative path
to an audio file ("path" key) and its bytes content ("bytes" key). This feature extracts the audio data.
- an :class:`Image` feature to store the absolute path to an image file, an :obj:`np.ndarray` object, a :obj:`PIL.Image.Image` object
or a dictionary with the relative path to an image file ("path" key) and its bytes content ("bytes" key). This feature extracts the image data.
- :class:`datasets.Translation` and :class:`datasets.TranslationVariableLanguages`, the two features specific to Machine Translation
"""

Expand Down
Loading