Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated snippets for exporting dataset to use load_dataset #4545

Merged
merged 7 commits into from
Jul 10, 2024

Conversation

minhtuev
Copy link
Contributor

@minhtuev minhtuev commented Jul 1, 2024

What changes are proposed in this pull request?

  1. Currently, snippets for exporting dataset use fo.Dataset(...), which is incorrect because this is the syntax for creating a new dataset. To load an existing dataset, use fo.load_dataset(...) instead.

Example snippet:

import fiftyone as fo
dataset_or_view = fo.Dataset("oi-v7full-2mm")
export_dir = "/path/for/export"
label_field = "ground_truth"

Error:

File ~/workspace/fiftyone-teams/fiftyone/core/dataset.py:117, in _validate_dataset_name(name, skip)
    115 clashing_name = clashing_name_doc["name"]
    116 if clashing_name == name:
--> 117     raise ValueError(f"Dataset name '{name}' is not available")
    118 else:
    119     raise ValueError(
    120         f"Dataset name '{name}' is not available: slug '{slug}' "
    121         f"in use by dataset '{clashing_name}'"
    122     )

ValueError: Dataset name 'oi-v7full-2mm' is not available

Correct snippet should be:

import fiftyone as fo
dataset_or_view = fo.load_dataset("oi-v7full-2mm")
  1. Add a parameter create_if_necessary to load_dataset to return an empty dataset if none currently exists.

How is this patch tested? If it is not, please explain why.

  1. Documentation should render correctly.
  2. Passing create_if_necessary to load_dataset should create a new dataset if an existing one does not exist.

Testing:

  • Manual testing (✅ )
  • Unit tests (✅ )

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release
    notes for FiftyOne users.

(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features
    • Improved error handling with the introduction of DatasetNotFoundError for missing datasets.
  • Bug Fixes
    • Enhanced error messaging by replacing ValueError with DatasetNotFoundError when datasets are not found.
  • Tests
    • Added new tests validating dataset loading, creation, re-loading, and exception handling scenarios.

Copy link
Contributor

coderabbitai bot commented Jul 1, 2024

Walkthrough

These changes enhance the error handling and functionality of the FiftyOne dataset management system. A new DatasetNotFoundError exception is introduced to handle missing datasets more gracefully. The load_dataset function is updated to create a new dataset if it's not found, and the runner.py and unit tests are also adapted to utilize the new exception for consistency.

Changes

Files Change Summary
fiftyone/core/dataset.py Added DatasetNotFoundError class and modified load_dataset to handle missing datasets more effectively.
fiftyone/__public__.py Added DatasetNotFoundError to the list of imports.
fiftyone/migrations/runner.py Replaced ValueError with DatasetNotFoundError in get_dataset_revision for better error handling.
tests/unittests/dataset_tests.py Added a new test method test_load_dataset to validate the new dataset loading and exception handling.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FiftyOne

    User->>FiftyOne: Call load_dataset(name)
    alt Dataset exists
        FiftyOne->>User: Return existing dataset
    else Dataset does not exist
        alt Creation allowed
            FiftyOne->>User: Create and return new dataset
        else Creation not allowed
            FiftyOne->>User: Raise DatasetNotFoundError
        end
    end
Loading

Poem

In the world of data flows,
A change to handle woes.
Datasets found or created anew,
Errors caught with grace, it's true.
FiftyOne now stands tall and proud,
For data's call, it answers loud.
🐰✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@minhtuev minhtuev self-assigned this Jul 1, 2024
@minhtuev
Copy link
Contributor Author

minhtuev commented Jul 1, 2024

Thanks @manivoxel51 and @allenleetc for helping me to debug my export/import snippet 💯

@swheaton
Copy link
Contributor

swheaton commented Jul 2, 2024

Hmm I think this is partially just user learning curve. On the other side of the coin, if the dataset did not exist then calling load_dataset() would fail too.

The intention is that it's not a fully executable code snippet, but that dataset_or_view should be set to a new dataset, loaded dataset, or any view into a dataset. From there, the remaining snippet lines would work.

Perhaps there is another way to denote that we aren't dictating how to load the dataset or view in the snippet but the focus is on what happens afterwards.

@minhtuev
Copy link
Contributor Author

minhtuev commented Jul 2, 2024

@swheaton : since the tutorial is about exporting dataset, I think it is safe to assume that the dataset exists. The same code snippet is also shared when clicking the export button on Fiftyone as well.

image

Comment on lines 151 to 157
try:
return Dataset(name, _create=False)
except MissingDatasetError as ex:
if create_if_missing:
return Dataset(name)
else:
raise ex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use fo.dataset_exists() over try-except.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with @sashankaryal 's comment. While it might look nicer, this causes an extra database lookup for the happy path.
We should assume it exists and fall back to creating rather than checking for existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how much do we want to optimize database lookup? I don't think the users will be calling load_dataset very often, probably once at the beginning of the script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might be surprised. especially when going through pymongo proxy (dont wanna get into it lol). where each call to mongodb must be absolutely necessary haha.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, @sashankaryal what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @sashankaryal, let's go with @swheaton's suggestion

@minhtuev minhtuev marked this pull request as ready for review July 2, 2024 20:45
coderabbitai[bot]

This comment was marked as resolved.

@minhtuev minhtuev force-pushed the feature/update-snippet-export-dataset branch from 21f7a8c to a4fa2b9 Compare July 2, 2024 21:02
coderabbitai[bot]

This comment was marked as resolved.

@minhtuev minhtuev requested a review from brimoor July 2, 2024 21:19
@minhtuev
Copy link
Contributor Author

minhtuev commented Jul 2, 2024

@sashankaryal @benjaminpkane @swheaton for the UI path to generate code to export dataset, do you know where it lives? Somehow I can't find it in either app folder or in the fiftyone python package, I have tried grep -r as well..

image

fiftyone/core/dataset.py Outdated Show resolved Hide resolved
fiftyone/core/dataset.py Outdated Show resolved Hide resolved
@minhtuev minhtuev requested a review from brimoor July 3, 2024 01:46
coderabbitai[bot]

This comment was marked as outdated.

@minhtuev minhtuev force-pushed the feature/update-snippet-export-dataset branch from 487a5fc to 87b446d Compare July 3, 2024 06:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 487a5fc and 87b446d.

Files selected for processing (3)
  • fiftyone/core/dataset.py (4 hunks)
  • tests/unittests/dataset_tests.py (1 hunks)
  • tests/unittests/utils_tests.py (5 hunks)
Files skipped from review as they are similar to previous changes (2)
  • fiftyone/core/dataset.py
  • tests/unittests/dataset_tests.py
Additional comments not posted (3)
tests/unittests/utils_tests.py (3)

462-466: Ensure consistent mocking of dataset_exists.

The dataset_exists mock patch is correctly added to the test_load_dataset_by_id method. Ensure that the mock patch is consistently applied across all relevant test methods.


488-492: Ensure consistent mocking of dataset_exists.

The dataset_exists mock patch is correctly added to the test_load_dataset_by_alt_id method. Ensure that the mock patch is consistently applied across all relevant test methods.


513-515: Ensure consistent mocking of dataset_exists.

The dataset_exists mock patch is correctly added to the test_load_dataset_by_name method. Ensure that the mock patch is consistently applied across all relevant test methods.

Copy link
Contributor

@swheaton swheaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree with review comment

@swheaton
Copy link
Contributor

swheaton commented Jul 3, 2024

@minhtuev so that actually is coming from the Teams app which is in voxel-hub repo.
The snippet you're seeing is specifically in voxel-hub/app/packages/app/pages/datasets/\[slug\]/components/ExportView/CodeExport.tsx (backslashes are mine to make it work in unix)

@minhtuev minhtuev requested a review from swheaton July 8, 2024 19:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 87b446d and fbf2532.

Files selected for processing (1)
  • fiftyone/core/dataset.py (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • fiftyone/core/dataset.py

fiftyone/core/dataset.py Outdated Show resolved Hide resolved
fiftyone/core/dataset.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fbf2532 and 0c50588.

Files selected for processing (1)
  • fiftyone/core/dataset.py (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • fiftyone/core/dataset.py

@minhtuev minhtuev force-pushed the feature/update-snippet-export-dataset branch from 0c50588 to 32b93af Compare July 10, 2024 17:50
@minhtuev minhtuev requested a review from swheaton July 10, 2024 17:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 0c50588 and 32b93af.

Files selected for processing (4)
  • fiftyone/public.py (1 hunks)
  • fiftyone/core/dataset.py (4 hunks)
  • fiftyone/migrations/runner.py (1 hunks)
  • tests/unittests/dataset_tests.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • fiftyone/core/dataset.py
Files skipped from review as they are similar to previous changes (1)
  • tests/unittests/dataset_tests.py
Additional context used
Ruff
fiftyone/__public__.py

36-36: .core.dataset.DatasetNotFoundError imported but unused

Remove unused import

(F401)

Additional comments not posted (1)
fiftyone/migrations/runner.py (1)

62-62: Improved error handling.

Raising DatasetNotFoundError instead of ValueError improves error specificity and clarity.

@minhtuev minhtuev force-pushed the feature/update-snippet-export-dataset branch from 32b93af to df3333d Compare July 10, 2024 18:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 32b93af and df3333d.

Files selected for processing (4)
  • fiftyone/public.py (1 hunks)
  • fiftyone/core/dataset.py (4 hunks)
  • fiftyone/migrations/runner.py (1 hunks)
  • tests/unittests/dataset_tests.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • fiftyone/core/dataset.py
  • fiftyone/migrations/runner.py
  • tests/unittests/dataset_tests.py
Additional context used
Ruff
fiftyone/__public__.py

36-36: .core.dataset.DatasetNotFoundError imported but unused

Remove unused import

(F401)

@@ -33,6 +33,7 @@
from .core.config import AppConfig
from .core.dataset import (
Dataset,
DatasetNotFoundError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import.

The DatasetNotFoundError import is unused in this file and should be removed to avoid clutter.

-    DatasetNotFoundError,
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DatasetNotFoundError,
Tools
Ruff

36-36: .core.dataset.DatasetNotFoundError imported but unused

Remove unused import

(F401)

Copy link
Contributor

@sashankaryal sashankaryal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New load_dataset LGTM!!

Copy link
Contributor

@swheaton swheaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great thanks

@minhtuev minhtuev merged commit c43fb2f into develop Jul 10, 2024
13 checks passed
@minhtuev minhtuev deleted the feature/update-snippet-export-dataset branch July 10, 2024 20:45
@coderabbitai coderabbitai bot mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants