-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update format, fingerprint and indices after add_item #2254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will take this PR as an example to complete my PR on add_column
.
src/datasets/arrow_dataset.py
Outdated
if self._indices is None: | ||
indices_table = None | ||
else: | ||
new_indices_array = pa.array([len(self._data)], type=pa.uint64()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe calling it item_indices_table
to be consistent with item_table
above.
dataset.reset_format() | ||
dataset_to_test.reset_format() | ||
assert dataset[:-1] == dataset_to_test[:] | ||
assert {k: int(v) for k, v in dataset[-1].items()} == {k: int(v) for k, v in item.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to introduce an explicit test for the _indices?
For example, this test passes even if I wrongly set in add_item
:
new_indices_array = pa.array([9], type=pa.uint64())
I renamed the variable, added a test for dataset._indices and fixed an issue with class_encode_column |
Added fingerprint and format update wrappers + update the indices by adding the index of the newly added item in the table.