Fix COCO category IDs #4884

brimoor · 2024-10-03T14:26:14Z

Resolves #4615
Resolves #4795
Resolves #4570

#4354 introduced regression #4795, which this PR fixes. I also resolved #4615 while I was at it.

Tested by

Unit tests are added.

Example usage

import fiftyone as fo
import fiftyone.zoo as foz

 # The `classes` parameter works again
dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    label_types="detections",
    classes=["chair"],
    max_samples=10,
    only_matching=True,
)

assert len(dataset) == 10
assert dataset.distinct("ground_truth.detections.label") == ["chair"]

# COCO exports now use 1-based category IDs
dataset.export(
    export_dir="/tmp/coco-2017",
    dataset_type=fo.types.COCODetectionDataset,
    label_field="ground_truth",
    overwrite=True,
)

dataset2 = fo.Dataset.from_dir(
    dataset_dir="/tmp/coco-2017",
    dataset_type=fo.types.COCODetectionDataset,
    label_types="detections",
    label_field="ground_truth",
)

assert [c["id"] for c in dataset2.info["categories"]] == [1]

Summary by CodeRabbit

Documentation
- Enhanced clarity and accuracy of dataset loading instructions for the FiftyOne library, including updates to the COCODetectionDataset format.
- Expanded instructions for dataset creation with detailed parameter explanations and clearer definitions of dataset types.
- Updated export documentation to clarify functionality and usage, including support for various formats and new sections on label type coercion and custom exporters.
- Improved formatting and organization for better readability throughout the documentation.
- Updated COCO integration documentation to clarify loading, exporting, and evaluating COCO datasets.
Tests
- Added unit tests for category handling and integrity of exported datasets, covering multiple dataset types and ensuring accurate metadata preservation.

coderabbitai · 2024-10-03T14:26:21Z

Walkthrough

The pull request introduces updates to the FiftyOne library's documentation and codebase, focusing on dataset loading and exporting functionalities. Key changes include modifications to the COCODetectionDataset format, enhancements in user guide clarity, and improvements in method signatures for dataset handling. Additionally, unit tests have been expanded to ensure robustness in category management and dataset integrity during import/export operations.

Changes

File Path	Change Summary
docs/source/user_guide/dataset_creation/datasets.rst	Updated dataset loading instructions, including COCODetectionDataset format adjustments and expanded guidance.
docs/source/user_guide/export_datasets.rst	Enhanced documentation for exporting datasets, including new examples and clarifications on label coercion.
docs/source/integrations/coco.rst	Clarified COCO integration documentation, including loading, exporting, and evaluating COCO datasets.
fiftyone/utils/coco.py	Modified functions and classes for improved handling of COCO dataset imports and exports, including signature updates.
tests/unittests/import_export_tests.py	Added unit tests for category handling and dataset integrity during import/export, covering various dataset types.

Possibly related PRs

Add support for exports using existing COCO IDs #4530: The changes in the fiftyone/utils/coco.py file regarding the handling of category_id adjustments and the introduction of a new coco_id parameter for exporting datasets are directly related to the modifications in the main PR, which also discusses changes to the COCODetectionDataset format and category handling.
quickstart 3d dataset #4406: The updates in the docs/source/user_guide/using_datasets.rst file regarding dataset loading processes and the handling of grouped datasets may relate to the expanded dataset creation instructions in the main PR, which aims to clarify dataset loading for various formats.
Updated snippets for exporting dataset to use load_dataset #4545: The changes in the fiftyone/core/dataset.py file that involve updating the dataset loading methods and ensuring proper handling of existing datasets are relevant to the main PR's focus on enhancing dataset loading instructions and clarifications.

Suggested labels

documentation, feature

Suggested reviewers

imanjra
benjaminpkane
sashankaryal

Poem

In the garden where datasets bloom,
We tidy the paths, make space in the room.
With clearer guides and tests that delight,
Our FiftyOne library shines ever so bright!
Hop along, dear friends, let’s explore and create,
With every new change, we celebrate! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (10)

docs/source/user_guide/export_datasets.rst (3)

Line range hint 4-4: Address the TODO comment for adding tests

The TODO comment indicates that tests are missing for this function. It's important to add unit tests to ensure the function behaves correctly for various inputs and edge cases.

Would you like assistance in generating unit tests for this function?

Line range hint 12-24: Refactor discount calculation for fairness and clarity

The current implementation has a few areas for improvement:

The flat $20 fee on discounted bills can lead to situations where customers with a small discount pay more than those without a discount. Consider a percentage-based fee or a sliding scale.

The function doesn't handle edge cases like negative amounts or loyalty years. Add input validation to ensure robust behavior.

The function is quite long. Consider breaking it down into smaller, more focused functions for better readability and maintainability.

There's no handling for very large loyalty years, which could lead to unexpectedly high discounts if not intended.

Would you like assistance in refactoring this function to address these points?

Line range hint 1-24: Consider overall improvements for consistency and robustness

To enhance the overall quality of the code:

Implement consistent error handling and input validation across all functions.

Add docstrings and type hints to improve code clarity and maintainability.

Consider removing the coderabbit_ prefix from function names if it's not serving a specific purpose.

Ensure all functions follow the same style and level of complexity – currently, there's a mix of very simple functions and more complex ones.

These changes will make the code more robust, easier to understand, and maintain in the long run.
docs/source/user_guide/dataset_creation/datasets.rst (7)
Line range hint 1-114: Consider adding a brief explanation of FiftyOne's purpose in the introduction.

The introduction effectively outlines FiftyOne's dataset import capabilities. However, for newcomers, it might be helpful to include a brief explanation of what FiftyOne is and its primary purpose before diving into the specifics of dataset import.

Consider adding a sentence or two at the beginning, such as:

"FiftyOne is a tool for building, analyzing, and improving datasets for computer vision tasks. It provides powerful capabilities for dataset import, visualization, and manipulation."

Line range hint 116-214: Consider grouping dataset types by category for easier navigation.

The table of supported formats is comprehensive and well-structured. To improve usability, especially as the list of supported formats grows, consider grouping the dataset types into categories such as "Image Classification", "Object Detection", "Segmentation", etc. This grouping could help users quickly find the format they're looking for.

Here's an example of how you could structure the grouping:
.. table::
    :widths: 40 60

    +------------------------+-----------------------------------------------------------+
    | Image Classification                                                               |
    +========================+===========================================================+
    | ImageClassification... | A labeled dataset consisting of images and their ...      |
    +------------------------+-----------------------------------------------------------+
    | TFImageClassificat... | A labeled dataset consisting of images and their ...       |
    +------------------------+-----------------------------------------------------------+
    | Object Detection                                                                   |
    +========================+===========================================================+
    | COCODetectionDataset   | A labeled dataset consisting of images and their ...      |
    +------------------------+-----------------------------------------------------------+
    | VOCDetectionDataset    | A labeled dataset consisting of images and their ...      |
    +------------------------+-----------------------------------------------------------+
    ...
Line range hint 216-2153: Ensure consistency in structure and information across all dataset type sections.

The individual dataset type import sections are comprehensive and provide valuable information. To further improve the documentation, consider standardizing the structure of each section to include the following elements consistently:

Brief description of the dataset type

File structure expected by the importer

Python code example

CLI code example

Notes on specific features or requirements (if applicable)

Link to the corresponding DatasetImporter class for advanced usage

This consistent structure will make it easier for users to find the information they need across different dataset types.

Consider creating a template for dataset type sections to ensure all necessary information is included consistently.

Line range hint 2155-2220: Expand guidance on choosing between custom importers and other methods.

The custom formats section effectively introduces the concept of custom DatasetImporter classes. To provide more value to users, consider expanding this section to include:

A comparison of when to use custom importers vs. the simple Python loop method mentioned at the beginning.

Guidelines or best practices for deciding when to create a custom importer.

A brief example of a simple custom importer to illustrate the concept.

This additional information would help users make informed decisions about the best approach for their specific use case.

Consider adding a subsection titled "When to use custom importers" that outlines the scenarios where custom importers are most beneficial, such as:

When you need to reuse the import logic across multiple projects

When the import process is complex and requires significant preprocessing

When you want to leverage FiftyOne's built-in import utilities and error handling

Line range hint 2222-3203: Consider adding a high-level overview and flowchart for custom importer creation.

The "Writing a custom DatasetImporter" section provides detailed templates and explanations for creating custom importers. To make this complex information more accessible, consider adding:

A high-level overview of the custom importer creation process.

A flowchart or decision tree to guide users in choosing the appropriate importer type for their dataset.

A simple, complete example of a custom importer for a common use case.

These additions would help users grasp the overall concept before diving into the specific implementation details.

Add a subsection at the beginning of "Writing a custom DatasetImporter" with a high-level overview, such as:
Custom Importer Overview
------------------------

Creating a custom importer involves the following steps:

1. Choose the appropriate base class (UnlabeledImageDatasetImporter, LabeledImageDatasetImporter, etc.)
2. Implement required methods (__init__, __len__, __next__, etc.)
3. Define properties (has_dataset_info, label_cls, etc.)
4. Implement optional methods (setup, get_dataset_info, close)

The choice of base class depends on your dataset type:

- UnlabeledImageDatasetImporter: For datasets with images only
- LabeledImageDatasetImporter: For datasets with images and labels
- UnlabeledVideoDatasetImporter: For datasets with videos only
- LabeledVideoDatasetImporter: For datasets with videos and labels
- GroupDatasetImporter: For datasets with grouped samples

Follow the templates below for your specific dataset type.
Line range hint 3205-3239: Enhance explanation of dataset-level information importance and usage.

The "Importing dataset-level information" section provides valuable information about importing dataset-level metadata. To improve this section:

Add a brief explanation of why dataset-level information is important and how it can be used in FiftyOne.

Provide a simple example of how to implement the get_dataset_info() method in a custom importer.

Clarify how users can access and utilize the imported dataset-level information in their FiftyOne workflows.

These additions would help users understand the significance of this feature and how to leverage it effectively.

Consider adding an example implementation of get_dataset_info():
def get_dataset_info(self):
    return {
        "name": "My Custom Dataset",
        "version": "1.0",
        "classes": ["cat", "dog", "bird"],
        "author": "John Doe",
        "date_created": "2023-04-15",
        "description": "A dataset of pet images for classification",
    }
Also, add a note on how to access this information:
Note: After importing, you can access the dataset-level information using the `dataset.info` attribute or specific properties like `dataset.classes`.
Line range hint 3241-3526: Clarify the relationship between custom Dataset types and Importers.

The "Writing a custom Dataset type" section provides good information on creating custom dataset types. To improve this section:

Explain the relationship between custom Dataset types and custom DatasetImporters more explicitly.

Provide a brief example of how a custom Dataset type and its corresponding Importer work together.

Clarify when users should create a custom Dataset type versus just a custom Importer.

These additions would help users understand the full picture of custom dataset handling in FiftyOne.

Consider adding a subsection at the beginning of "Writing a custom Dataset type" to explain the relationship:
Relationship between Dataset Types and Importers
------------------------------------------------

In FiftyOne, Dataset Types and Importers work together to provide a complete solution for handling custom datasets:

- Dataset Type: Defines the format and structure of the dataset on disk.
- Importer: Implements the logic to read the dataset from disk and create FiftyOne samples.

When you create a custom Dataset Type, you typically need to create a corresponding custom Importer. The Dataset Type class specifies which Importer to use through the `get_dataset_importer_cls()` method.

Create a custom Dataset Type when you want to:
1. Define a new dataset format that can be consistently used across projects.
2. Provide a convenient way to import your custom dataset format using `Dataset.from_dir()`.
3. Enable exporting datasets in your custom format (by also implementing a custom Exporter).

If you only need to import a dataset once or in a very specific scenario, creating just a custom Importer might be sufficient.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 77f664c and 2bedfb3.

📒 Files selected for processing (4)

docs/source/user_guide/dataset_creation/datasets.rst (2 hunks)
docs/source/user_guide/export_datasets.rst (2 hunks)
fiftyone/utils/coco.py (9 hunks)
tests/unittests/import_export_tests.py (1 hunks)

🔇 Additional comments (12)

fiftyone/utils/coco.py (10)

567-567: Include class names in dataset info

Setting info["classes"] = _to_classes(classes_map) ensures that the class names are included in the dataset information. This is important for downstream processes that may rely on the class list provided in the dataset metadata.

Line range hint 570-577: Align function call with updated signature

By passing classes_map to _get_matching_image_ids, you align the function call with its updated signature. This ensures that image IDs are correctly filtered based on the provided classes.

906-906: Generate label mapping for dynamic classes

When using dynamic classes (self._dynamic_classes is True), sorting self._classes before generating labels_map_rev ensures consistent category IDs in the exported annotations. This change helps maintain the correct mapping between class names and IDs.

909-910: Handle cases where categories are not provided

Adding elif self.categories is None correctly handles scenarios where categories are not supplied. This prevents overwriting labels_map_rev when categories are provided, ensuring that the category IDs remain consistent.

931-934: Include 'supercategory' in category definitions

Including the 'supercategory': None field in the category definitions adheres to the COCO format specifications. This enhances compatibility with tools and libraries that expect the supercategory field in the category metadata.

1679-1679: Pass 'all_classes_map' to '_get_images_with_classes'

By passing all_classes_map to _get_images_with_classes, you're ensuring that the correct class ID mappings are used when filtering images based on the specified classes. This helps accurately identify images containing the target classes.

1844-1844: Update '_get_matching_image_ids' function call

Including classes_map in the _get_matching_image_ids function call aligns with the updated function signature. This ensures that image IDs are filtered correctly based on the classes mapping.

1860-1860: Adjust parameters in '_get_images_with_classes' call

Passing classes_map as the last parameter matches the updated signature of _get_images_with_classes. This change ensures that class names are correctly mapped to IDs during image filtering.

1937-1939: Create reverse label mapping from 'classes_map'

Generating labels_map_rev = {c: i for i, c in classes_map.items()} creates a mapping from class names to IDs. This reverse mapping is essential for validating provided class names and filtering images accordingly.

2028-2028: Ensure category IDs start from 1

Returning {c: i for i, c in enumerate(classes, 1)} in _to_labels_map_rev ensures that category IDs start from 1, which aligns with the COCO format requirements where category IDs are positive integers starting from 1.

tests/unittests/import_export_tests.py (2)

1320-1339: Test for Alphabetized 1-Based Categories is Correct

The added test correctly verifies that categories are alphabetized and assigned 1-based IDs by default when exporting to COCO format. This ensures that the exported dataset maintains the expected structure.

1340-1378: Test for Loading Matching Classes Behaves as Expected

The test accurately checks the behavior of loading a COCO dataset with specified classes and the only_matching flag. It confirms that when only_matching=False, all samples are loaded but only specified classes are filtered in the labels, and when only_matching=True, only the detections matching the specified classes are included. This ensures the dataset loading process functions as intended.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

fiftyone/utils/coco.py (1)

Line range hint 1928-1939: Improved class mapping handling

The changes to _get_images_with_classes enhance the function's ability to work with class mappings. The addition of classes_map to the function signature and the creation of labels_map_rev improve the handling of class IDs and labels. The use of class_ids instead of target_class_ids enhances naming consistency.

Consider renaming labels_map_rev to class_id_map or label_to_id_map for better clarity on its purpose.

docs/source/user_guide/dataset_creation/datasets.rst (1)

Line range hint 4-4: TODO: Add tests for this function

There's a TODO comment indicating that tests need to be added for this function. It's important to implement these tests to ensure the function works as expected, especially after the signature change.

Would you like assistance in generating unit tests for this function?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2bedfb3 and f08b8cc.

📒 Files selected for processing (4)

docs/source/user_guide/dataset_creation/datasets.rst (2 hunks)
docs/source/user_guide/export_datasets.rst (2 hunks)
fiftyone/utils/coco.py (9 hunks)
tests/unittests/import_export_tests.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

docs/source/user_guide/export_datasets.rst

🔇 Additional comments (10)

fiftyone/utils/coco.py (5)

567-567: Improvement in class information handling

This addition ensures that the info dictionary contains a "classes" key with a sorted list of class labels. Using _to_classes(classes_map) converts the classes_map to a sorted list, which improves consistency and makes it easier to work with the class information later in the code.

570-570: Enhanced image filtering with class mapping

The addition of classes_map as an argument to _get_matching_image_ids improves the function's ability to filter images based on classes. This change allows for more accurate and efficient image selection when working with specific class requirements.

Line range hint 1844-1860: Consistent update to image filtering function

The changes to _get_matching_image_ids are consistent with the earlier modification. By including classes_map in the function signature and passing it to _get_images_with_classes, the code ensures that class mapping information is used consistently throughout the image filtering process. This improves the accuracy and reliability of class-based image selection.

2028-2028: Simplified and improved class ID mapping

The modification to _to_labels_map_rev using enumerate(classes, 1) is an excellent improvement. This change makes the function more concise and Pythonic while ensuring that class IDs start from 1, which is consistent with the COCO format. This simplification reduces the chance of errors and improves code readability.

Line range hint 1-2391: Overall improvement in COCO dataset handling

The changes made to this file significantly enhance the handling of class mappings and category IDs in COCO dataset operations. The modifications improve consistency, efficiency, and accuracy in filtering images based on classes and working with class information. These updates will lead to more reliable and maintainable code when working with COCO format datasets in FiftyOne.
tests/unittests/import_export_tests.py (5)
1320-1377: LGTM! New tests for COCO dataset handling added.

The changes introduce important new test cases for the COCO detection dataset export and import functionality:

Verifying that categories are alphabetized and use 1-based indexing by default.

Testing the ability to load only matching classes, both with and without including non-matching samples.

These additions enhance the test coverage for critical COCO dataset handling features.

Line range hint 1379-1412: Great addition of test for add_yolo_labels functionality.

This new test function covers important aspects of adding YOLO labels to a dataset:

Standard label addition, verifying that label counts match between original and new fields.

Inclusion of missing labels, ensuring all samples have labels when the include_missing option is used.

The test cases are well-structured and cover both the happy path and an important edge case.

Line range hint 1433-1813: Excellent addition of comprehensive tests for 3D media export.

The new ThreeDMediaTests class provides thorough coverage of 3D media export functionality:

Tests for flat relative and absolute addressed scenes.

Tests for nested relative structures with both flattening and maintaining directory structure.

Verification of exported file structure and content integrity.

Key strengths of this test suite:

Uses temporary directories for clean test environments.

Covers various realistic scenarios for 3D media organization.

Includes detailed assertions to verify both structure and content of exported files.

This addition significantly enhances the test coverage for 3D media export features, which is crucial for ensuring reliability in handling complex 3D data structures.

Line range hint 1320-1813: Summary: Significant enhancement of test coverage for various dataset types and export/import functionalities.

These changes introduce valuable additions to the test suite:

Extended tests for COCO detection dataset handling, including category alphabetization and selective class loading.

New tests for YOLO and COCO label addition functionalities.

Comprehensive test suite for 3D media export covering various file structure scenarios.

These additions significantly improve the robustness of the testing framework, especially for complex use cases and edge scenarios. The new tests will help ensure the reliability and correctness of dataset manipulation and export/import operations across different formats and structures.

Suggestions for future improvements:

Consider addressing the Windows compatibility issue in the COCO label addition test.

As the test file grows, it might be beneficial to consider splitting it into more focused test files for better organization and maintainability.

Overall, these changes represent a substantial improvement in the quality and coverage of the test suite.

Line range hint 1414-1431: Approved: New test for add_coco_labels, but consider Windows compatibility.

This new test function effectively verifies the add_coco_labels functionality by:

Exporting the dataset to COCO format.

Adding COCO labels back to the original dataset.

Verifying that label counts match between original and new fields.

The test structure is sound and covers the core functionality well.

The @skipwindows decorator suggests potential compatibility issues on Windows platforms. It would be beneficial to investigate and address these issues to ensure cross-platform compatibility.

To verify the extent of Windows-specific issues, you can run the following commands:

✅ Verification successful

Approved: New test for add_coco_labels, but consider Windows compatibility.

This new test function effectively verifies the add_coco_labels functionality by:

Exporting the dataset to COCO format.

Adding COCO labels back to the original dataset.

Verifying that label counts match between original and new fields.

The test structure is sound and covers the core functionality well.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for all occurrences of @skipwindows in the codebase
rg "@skipwindows" --type python

# Check if there are any Windows-specific handling in the codebase
rg "if os\.name == (?:'nt'|\"nt\")" --type python
Length of output: 149

Script:
#!/bin/bash
# Search for all occurrences of @skipwindows in the codebase
rg "@skipwindows" --type py

# Check if there are any Windows-specific handling in the codebase
rg "if os\.name == (?:'nt'|\"nt\")" --type py
Length of output: 464

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)

fiftyone/utils/coco.py (5)
48-48: Approved: Improved parameter naming and documentation

The changes in this segment are well-thought-out and improve the function's alignment with COCO terminology:

Renaming classes to categories is more accurate and consistent with COCO format.

The updated docstring provides clear information about the accepted formats for the categories parameter.

The change in category_id from 2 to 1 in the examples is consistent with COCO's 1-based indexing for category IDs.

Consider adding a brief explanation in the docstring about why category IDs start from 1 (e.g., "COCO uses 1-based indexing for category IDs") to help users understand this convention.

Also applies to: 71-71, 91-91, 112-112, 132-139

916-920: Approved: Improved handling of categories in export process

The changes in the COCODetectionDatasetExporter class enhance the robustness and flexibility of the export process:

The close method now handles the case when self.categories is None, providing a fallback to use self.classes.

The logic for creating the categories list has been updated to ensure correct assignment of category IDs when using a custom labels_map_rev.

These changes improve the exporter's ability to handle various input scenarios.

Consider simplifying the logic for creating the categories list using a list comprehension:
categories = [
    {"id": i, "name": c, "supercategory": None}
    for c, i in sorted(labels_map_rev.items(), key=lambda t: t[1])
]
This change would make the code more concise and potentially more readable.

Also applies to: 941-944

1938-1938: Approved: Enhanced class handling in image filtering

The changes in the _get_images_with_classes function improve its flexibility and efficiency:

The function now uses classes_map instead of all_classes, consistent with earlier changes.

A labels_map_rev dictionary is created for efficient lookup of class IDs by label.

The logic for checking unsupported classes has been updated to use labels_map_rev.

These modifications enhance the function's ability to handle various class mapping scenarios.

Consider using a set comprehension for creating class_ids to potentially improve performance:
class_ids = {labels_map_rev[c] for c in target_classes}
This change would eliminate the need for the separate set() call and might be slightly more efficient.

Also applies to: 1947-1951

2038-2038: Approved: Simplified and COCO-compliant label mapping

The _to_labels_map_rev function has been improved:

The function is now a concise one-line dictionary comprehension.

It uses 1-based indexing for class IDs, which is consistent with COCO's convention.

This change enhances readability and ensures compliance with COCO standards.

Consider adding a brief docstring to explain the function's purpose and the 1-based indexing:
def _to_labels_map_rev(classes):
    """
    Create a reverse mapping of class labels to 1-based category IDs.
    
    Args:
        classes: List of class labels.
    
    Returns:
        Dict mapping class labels to their corresponding 1-based category IDs.
    """
    return {c: i for i, c in enumerate(classes, 1)}
Line range hint 2038-2051: Approved: Improved category parsing with enhanced flexibility

The changes in the _parse_categories function improve its flexibility and readability:

The function now handles the case when classes is None, returning all categories.

The logic for creating the return dictionary has been simplified and made more concise.

The function correctly handles both cases: when specific classes are provided and when all classes should be included.

These modifications make the function more robust and easier to use in various scenarios.

Consider using a dictionary comprehension to make the function even more concise:
def _parse_categories(categories, classes=None):
    classes_map, _ = parse_coco_categories(categories)
    
    if classes is None:
        return classes_map
    
    classes = {classes} if isinstance(classes, str) else set(classes)
    return {c: i for c, i in classes_map.items() if c in classes}
This change would eliminate the need for the separate dictionary creation and potentially improve readability.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f08b8cc and 221d807.

📒 Files selected for processing (3)

docs/source/integrations/coco.rst (6 hunks)
fiftyone/utils/coco.py (15 hunks)
tests/unittests/import_export_tests.py (2 hunks)

🧰 Additional context used

🪛 Ruff

tests/unittests/import_export_tests.py

1822-1822: Ambiguous variable name: l

(E741)

🔇 Additional comments (10)

docs/source/integrations/coco.rst (6)

199-201: Excellent addition of the seed parameter for reproducibility!

The inclusion of the seed parameter in the take() method is a valuable improvement. This change ensures that the same subset of samples can be consistently selected across different runs, which is crucial for reproducibility in data science workflows.

Line range hint 219-230: Updated JSON structure enhances clarity

The revised JSON structure example now includes the categories section with an id field, which better reflects the actual COCO format. This change provides users with a more accurate representation of the data structure they'll be working with.

240-247: Improved annotation example with additional details

The updated annotation example now includes more precise coordinate values and an area field. These additions provide a more comprehensive representation of COCO annotations, which is beneficial for users who need to understand the full structure of the data.

270-272: Valuable information about imported COCO categories

The new lines explaining that COCO categories are also imported provide crucial information for users. This addition helps users understand that the category information is preserved when loading COCO-formatted data, which is important for maintaining the full context of the dataset.

319-323: Corrected terminology in mock COCO predictions

The update from class_id to category_id in the mock COCO predictions example is a significant improvement. This change aligns the example with the official COCO format, reducing potential confusion for users familiar with COCO datasets.

325-328: Updated method for adding COCO predictions

The change from classes to categories when adding COCO predictions to the dataset is an important update. This modification ensures consistency with the COCO format and the earlier changes in the document. It's a crucial change that maintains the integrity of the COCO data structure within FiftyOne.
fiftyone/utils/coco.py (2)
1854-1854: Approved: Consistent use of classes_map in image ID matching

The changes in the _get_matching_image_ids function maintain consistency with earlier modifications:

The function signature now includes classes_map instead of all_classes.

The call to _get_images_with_classes has been updated to use classes_map.

These changes improve the function's flexibility by relying on the classes_map for class information.

To ensure these changes don't affect existing function calls, let's verify the usage of _get_matching_image_ids in the codebase:
#!/bin/bash
# Search for calls to '_get_matching_image_ids'
echo "Searching for '_get_matching_image_ids' calls:"
rg --type python '_get_matching_image_ids\(' -C 3
Also applies to: 1870-1871

Line range hint 577-586: Approved: Consistent updates to class initialization and method call

The changes in the COCODetectionDatasetImporter class are consistent with the earlier renaming of classes to categories:

The __init__ method now uses categories instead of classes.

The _get_matching_image_ids method call has been updated to use classes_map instead of all_classes.

These changes improve consistency and reflect the shift towards using a mapping structure for categories.

To ensure these changes don't have unintended consequences, let's verify the usage of categories and classes_map in other parts of the codebase:

✅ Verification successful

Action Required: Re-run updated search scripts to verify usage of categories and classes_map

The previous script encountered an error due to an unrecognized file type and a syntax issue. Please run the corrected shell script below to search for usages of categories and classes_map in Python files:

Verification Complete: No Issues Found with categories and classes_map Usage

The updates to the COCODetectionDatasetImporter class are consistent across the codebase. The usage of categories and classes_map aligns with the renaming and restructuring changes, ensuring no unintended side effects.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for usages of 'categories' and 'classes_map' in Python files
echo "Searching for 'categories' usage:"
rg --type python 'categories' -C 2
echo "\nSearching for 'classes_map' usage:"
rg --type python 'classes_map' -C 2
Length of output: 285

Script:
#!/bin/bash
# Search for usages of 'categories' and 'classes_map' in Python files
echo "Searching for 'categories' usage:"
rg 'categories' -g '*.py' -C 2
echo "\nSearching for 'classes_map' usage:"
rg 'classes_map' -g '*.py' -C 2
Length of output: 49566
tests/unittests/import_export_tests.py (2)
1320-1338: Well-structured test for default COCO category handling.

This new test case is a valuable addition to the COCO dataset export functionality. It verifies that when no specific category information is provided, the exported categories are alphabetized and use 1-based indexing, which is the expected default behavior for COCO datasets.

1340-1377: ⚠️ Potential issue

Comprehensive tests for COCO class filtering.

These new test cases effectively cover class filtering scenarios for COCO dataset import. They verify both inclusive and exclusive filtering behaviors, which is crucial for ensuring correct data loading based on specified classes.

However, there's a minor issue with variable naming:

On line 1822, the variable name l is ambiguous. It's best to use more descriptive variable names to improve code readability.

Apply this diff to address the issue:
-        categories = [{"supercategory": "animal", "id": i, "name": l} for i, l in enumerate(classes, 1)]
+        categories = [{"supercategory": "animal", "id": i, "name": label} for i, label in enumerate(classes, 1)]
This change makes the code more readable without affecting its functionality.

Likely invalid or redundant comment.

benjaminpkane

LGTM

brimoor added the bug Bug fixes label Oct 3, 2024

brimoor requested review from swheaton and benjaminpkane October 3, 2024 14:26

This was referenced Oct 3, 2024

[BUG] COCO export format should be 1-indexed #4615

Closed

[BUG] Filtering class labels on COCO zoo dataset does NOT return samples matching the provided map #4795

Closed

coderabbitai bot reviewed Oct 3, 2024

View reviewed changes

fix COCO category IDs

f08b8cc

brimoor force-pushed the bugfix/iss-4615-4795 branch from 2bedfb3 to f08b8cc Compare October 3, 2024 16:33

brimoor changed the base branch from develop to release/v1.0.1 October 3, 2024 16:33

coderabbitai bot reviewed Oct 3, 2024

View reviewed changes

fixing add_coco_labels

221d807

coderabbitai bot reviewed Oct 3, 2024

View reviewed changes

brimoor mentioned this pull request Oct 4, 2024

[BUG] New to this program. How does "classes" argument work on COCO dataset? #4570

Closed

3 tasks

benjaminpkane approved these changes Oct 4, 2024

View reviewed changes

brimoor merged commit 17e8e53 into release/v1.0.1 Oct 4, 2024
13 checks passed

brimoor deleted the bugfix/iss-4615-4795 branch October 4, 2024 21:22

This was referenced Oct 4, 2024

Merge release/v1.0.1 to develop #4894

Merged

Merging #4758 into develop #4895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix COCO category IDs #4884

Fix COCO category IDs #4884

brimoor commented Oct 3, 2024 •

edited

Loading

coderabbitai bot commented Oct 3, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

benjaminpkane left a comment

Fix COCO category IDs #4884

Fix COCO category IDs #4884

Conversation

brimoor commented Oct 3, 2024 • edited Loading

Tested by

Example usage

Summary by CodeRabbit

coderabbitai bot commented Oct 3, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

benjaminpkane left a comment

Choose a reason for hiding this comment

brimoor commented Oct 3, 2024 •

edited

Loading

coderabbitai bot commented Oct 3, 2024 •

edited

Loading