Add support for summary fields #4765

minhtuev · 2024-09-03T20:55:58Z

What changes are proposed in this pull request?

Adds support for creating and managing summary fields on datasets.

How is this patch tested? If it is not, please explain why.

Manual tests (✅)
Unit tests (✅)

Example usage

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart-video")
dataset.set_field("frames.detections.detections.confidence", F.rand()).save()

# Generate a summary field for object labels
dataset.create_summary_field("frames.detections.detections.label")

# Generate a summary field for [min, max] confidences
dataset.create_summary_field("frames.detections.detections.confidence")

# Generate a summary field for object labels and counts
dataset.create_summary_field(
    "frames.detections.detections.label",
    field_name="frames_detections_label2",
    include_counts=True,
)

# Generate a summary field for per-label [min, max] confidences
dataset.create_summary_field(
    "frames.detections.detections.confidence",
    field_name="frames_detections_confidence2",
    group_by="label",
)

# List the summary fields on the dataset
fo.pprint(dataset.list_summary_fields())

# Inspect contents of these summary fields
fo.pprint(dataset.values("frames_detections_label"))
fo.pprint(dataset.values("frames_detections_confidence"))
fo.pprint(dataset.values("frames_detections_label2"))
fo.pprint(dataset.values("frames_detections_confidence2"))

# Verify that newly created summary fields don't needed updating
fo.pprint(dataset.check_summary_fields())

# Modify the source field
label_upper = F("label").upper()
dataset.set_field("frames.detections.detections.label", label_upper).save()

# Verify that the summary fields now need updating
update_fields = dataset.check_summary_fields()
fo.pprint(update_fields)

# Update the summary fields
for field_name in update_fields:
    dataset.update_summary_field(field_name)

# Summary fields are now updated and no longer need updating
fo.pprint(dataset.check_summary_fields())

# Delete summary fields
dataset.delete_summary_fields(dataset.list_summary_fields())
fo.pprint(dataset.list_summary_fields())

Summary by CodeRabbit

New Features
- Added methods for managing summary fields in datasets, including creation, listing, updating, and deletion.
- Introduced optional parameters for enhanced filtering in schema retrieval functions.
- Improved tools for summarizing and indexing data, particularly for video datasets.
- Enhanced user interface for managing sidebar groups and filtering sample fields.
Bug Fixes
- Streamlined error handling in the drop_index method.
Tests
- Added comprehensive tests for frame summary functionalities and summary fields to ensure accurate data handling and reporting.

coderabbitai · 2024-09-03T20:56:09Z

Warning

Rate limit exceeded

@brimoor has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 58 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between 685fbf0 and b774fcb.

Walkthrough

The pull request introduces new methods for managing summary fields in the fiftyone library, specifically within datasets. These methods include functionalities for listing, creating, checking, updating, and deleting summary fields that aggregate data across samples. Additionally, parameters for enhanced filtering have been added to existing schema retrieval methods. The changes also include updates to documentation and the addition of unit tests to verify the new functionalities.

Changes

Files	Change Summary
`fiftyone/core/dataset.py`	Introduced methods for managing summary fields: `list_summary_fields`, `create_summary_field`, `check_summary_fields`, `update_summary_field`, `delete_summary_field`, and `delete_summary_fields`. Modified `get_field_schema` and `get_frame_field_schema` for enhanced filtering.
`tests/unittests/dataset_tests.py`	Added `test_frame_summaries` and `test_summary_fields` methods to test the functionality of frame summaries and summary fields in datasets.

Possibly related PRs

Support large deletions by sample/frame IDs #4787: This PR modifies deletion methods in fiftyone/core/dataset.py, which may interact with the new summary field management methods, particularly in scenarios where summary fields need to be updated or deleted in conjunction with sample/frame deletions.

Poem

🐇 In fields of data, bright and wide,
New keys and dates, we take in stride.
Frame summaries dance, so neat and bright,
With tests to ensure they shine just right.
Hooray for changes, let’s hop and cheer,
For fiftyone's growth, we hold so dear! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-- I pushed a fix in commit <commit_id>, please review it.
-- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
-- @coderabbitai generate unit testing code for this file.
-- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
-- @coderabbitai generate interesting stats about this repository and render them as a table.
-- @coderabbitai read src/utils.ts and generate unit testing code.
-- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
-- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

fiftyone/core/dataset.py

tests/unittests/import_export_tests.py

fiftyone/core/fields.py

… fo.StringField 2. Index created for counts mode must have classifications.label 3. Updated docstring

fiftyone/core/collections.py

coderabbitai

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e3c144b and 59ce0f2.

Files selected for processing (7)

fiftyone/core/collections.py (6 hunks)
fiftyone/core/dataset.py (10 hunks)
fiftyone/core/fields.py (15 hunks)
fiftyone/core/odm/embedded_document.py (1 hunks)
fiftyone/core/odm/mixins.py (3 hunks)
fiftyone/core/view.py (6 hunks)
tests/unittests/dataset_tests.py (1 hunks)

Additional context used

Ruff

tests/unittests/dataset_tests.py

766-766: Do not assign a lambda expression, use a def

Rewrite to_sets as a def

(E731)

766-766: Ambiguous variable name: l

(E741)

Additional comments not posted (19)

fiftyone/core/odm/embedded_document.py (1)

51-52: LGTM! The addition of the __hash__ method is a useful enhancement.

By implementing the __hash__ method, instances of the DynamicEmbeddedDocument class become hashable. This allows them to be used as keys in dictionaries or elements in sets, expanding their usability in various scenarios.

The implementation, which converts the instance to a string and computes its hash value, is a common and reasonable approach. However, it's important to ensure that the string representation of the instance is unique and consistent across different instances to minimize the risk of hash collisions.

Overall, this change enhances the functionality and flexibility of the DynamicEmbeddedDocument class.

fiftyone/core/fields.py (5)

Line range hint 26-107: LGTM!

The new parameters info_keys and created_after are properly validated. The function raises appropriate ValueErrors if the provided values do not meet the expected types.

Line range hint 113-168: LGTM!

The function correctly checks the new info_keys and created_after constraints against the corresponding attributes of the field. It returns False if any of the constraints are not met.

Tools

Ruff

163-170: Return the negated condition directly

Inline condition

(SIM103)

Line range hint 270-362: LGTM!

The function properly incorporates the new info_keys and created_after parameters. It passes them to the validate_constraints function for validation and includes them in the kwargs dictionary when applying the constraints to filter the schema.

Line range hint 368-417: LGTM!

The flatten_schema function has been updated to include the new info_keys and created_after parameters. It passes these parameters to the validate_constraints function for validation and to the _flatten helper function to apply the constraints while flattening the schema.

Line range hint 429-467: LGTM!

The _flatten helper function has been properly updated to handle the new info_keys and created_after parameters. It passes these parameters to the matches_constraints function to filter the fields based on the specified constraints and propagates them recursively when flattening embedded document fields.

fiftyone/core/odm/mixins.py (2)

Line range hint 182-206: LGTM!

The new optional parameters info_keys and created_after expand the functionality of the get_field_schema function, allowing for more granular control over the schema retrieval process. The implementation looks good.

Line range hint 1-38: Looks good!

The function correctly extracts updates for filtered list fields and generates extra updates using the element ID and array filter syntax. It also handles the case where the $set operator becomes empty after extracting the updates. The validation for illegal modifications to the root of filtered list fields is a nice touch.

fiftyone/core/view.py (2)

918-919: LGTM!

The new info_keys and created_after parameters enhance the filtering capabilities of the get_field_schema function by allowing users to specify required keys in the field's info dictionary and a minimum field creation date. This provides more control over the returned schema. The changes are backwards compatible.

Also applies to: 937-940

977-978: LGTM!

The new info_keys and created_after parameters enhance the filtering capabilities of the get_frame_field_schema function by allowing users to specify required keys in the frame field's info dictionary and a minimum frame field creation date. This provides more control over the returned frame schema. The changes are backwards compatible.

Also applies to: 998-1001

tests/unittests/dataset_tests.py (1)

715-931: Comprehensive test for frame summaries functionality.

The test_frame_summaries method thoroughly tests the creation, management, and validation of frame summaries in the Dataset class. It covers various scenarios, including:

Generating frame summaries at dataset and frame levels with different configurations.

Asserting the correctness of the generated summaries against expected values.

Verifying the read-only status of summary fields.

Checking the creation of appropriate database indexes.

Updating dataset fields and ensuring frame summaries are updated accordingly.

Dropping frame summaries and confirming their removal.

The test method is well-structured, follows good practices for unit testing, and provides comprehensive coverage of the frame summaries functionality.

Tools

Ruff

766-766: Do not assign a lambda expression, use a def

Rewrite to_sets as a def

(E731)

766-766: Ambiguous variable name: l

(E741)

fiftyone/core/dataset.py (6)

1647-1658: LGTM!

This is a straightforward and useful function to list the frame summaries that have been created on the dataset. Using get_field_schema with flat=True and info_keys=_FRAME_SUMMARY_KEY to retrieve just the relevant frame summary fields is the right approach.

1660-1738: Excellent work!

This is a very comprehensive and well-designed function for creating frame summaries on a dataset. It covers all the key functionality and customization options one would expect, including:

Handling both categorical and numeric field types

Customizing the summary field name, sidebar group, counts, grouping, read-only status, indexing, and overwriting behavior

Using efficient aggregation pipelines to compute the summaries

Robust sidebar group management to ensure the summary field is added to an appropriate location

Safely defaulting to making the summary fields read-only

The implementation looks solid and I don't see any issues. Great job!

2026-2050: LGTM!

This function properly deletes a frame summary field from the dataset. Unsetting the read-only status before deleting is the correct approach, and the early return if the field doesn't exist is a nice optimization.

2052-2087: Looks good!

This is a handy function to identify frame summaries that may be outdated due to modifications made to the source frames or samples since the summary was created. Comparing the creation timestamp of the summary to the last modification timestamp of the frames/samples is a sensible approach.

Properly handling both frame-level and sample-level fields is important since frame summaries can be created from either.

No issues found - good work!

Line range hint 2088-2097: LGTM!

This is a straightforward extension of the existing _add_implied_sample_field to allow adding implied frame fields too.

The check that the dataset supports frame fields is good defensive programming. Reloading the dataset to update its schema after adding the field is the right step.

Looks good!

Line range hint 2099-2114: Looks good to me!

This function extends _merge_sample_field_schema to allow merging frame field schemas into the dataset as well.

The check that the dataset supports frame fields is a good safeguard. Only calling _reload() if the schema actually expanded is a nice optimization to avoid unnecessary work.

No issues found, good implementation!

Tools

Ruff

48-48: fiftyone.core.odm.dataset.SampleFieldDocument imported but unused

Remove unused import: fiftyone.core.odm.dataset.SampleFieldDocument

(F401)

fiftyone/core/collections.py (2)

1406-1407: LGTM!

The new info_keys and created_after parameters provide useful ways to filter the returned schema based on field metadata and creation time.

1449-1450: Looks good!

The info_keys and created_after parameters have been added to get_frame_field_schema as well, providing a consistent way to filter frame-level fields across the API.

tests/unittests/dataset_tests.py

fiftyone/core/odm/embedded_document.py

minhtuev · 2024-09-17T00:36:25Z

fiftyone/core/dataset.py

@@ -3692,7 +4230,7 @@ def _save(self, view=None, fields=None):
            self._deleted = True
            raise ValueError("Dataset '%s' is deleted" % name)

-    def _save_field(self, field):
+    def _save_field(self, field, _enforce_read_only=True):


I assume the reason for us to pass this as a private argument is that we don't want to publicly publish it?

In general, private methods are already "undocumented, call at your own risk" methods. But when it comes to bypassing things like read-only constraints, just prefer to send an extra signal, even to internal callers, to not use this parameter unless you reallllly know what you're doing

fiftyone/core/fields.py

minhtuev · 2024-09-17T00:44:41Z

tests/unittests/dataset_tests.py

+        for field_name in update_fields:
+            dataset.update_summary_field(field_name)
+
+        update_fields = dataset.check_summary_fields()


tests/unittests/dataset_tests.py

fiftyone/core/dataset.py

minhtuev · 2024-09-17T00:54:37Z

fiftyone/core/dataset.py

+                    field_name,
+                    fof.ListField,
+                    subfield=fof.EmbeddedDocumentField,
+                    embedded_doc_type=foo.DynamicEmbeddedDocument,


Noob question - what is dynamic embedded document?

https://docs.voxel51.com/user_guide/using_datasets.html#custom-embedded-documents

Haha nice, the Fiftyone nested document doll ~

fiftyone/core/dataset.py

minhtuev · 2024-09-17T01:09:13Z

Thanks @brimoor - this is awesome! I have a couple of thoughts:

create_field_summary is getting quite long, we should either consider refactoring it or add more comments to it. If we choose to refactor, we can do it in a separate PR (with TODO) so we can close this one.
If we delete all summary fields, does the summary sidebar group disappear automatically?

tests/unittests/dataset_tests.py

coderabbitai

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2b25199 and 6996af5.

Files selected for processing (2)

docs/source/user_guide/app.rst (2 hunks)
docs/source/user_guide/using_datasets.rst (1 hunks)

Additional comments not posted (6)

docs/source/user_guide/app.rst (4)
366-394: Excellent addition of the dataset filtering feature!

The new "Filtering sample fields" section provides clear instructions on using the App's UI to conveniently filter datasets by various fields.

Automatically making custom attributes filterable is a great usability enhancement. The ability to convert filters to view stages also seems very handy.

Overall, this feature will make it much easier to quickly drill down into datasets of interest. Nice work!

512-518: Resolved: Unassigned fields behavior is documented

The note on lines 512-518 directly addresses the suggestion from the previous comment.

It clearly states that fields not explicitly assigned to any sidebar group will be dynamically added to default groups in the App at runtime.

This provides the missing information and nicely complements the sidebar groups documentation. No further clarification is needed.

395-445: Great guidance on optimizing filters with indexes!

The new section provides clear recommendations and code examples for leveraging dataset indexes to speed up sidebar filters in the App.

The advice to create separate indexes for group datasets to optimize filters in both "group" and "slice" modes is very helpful.

Mentioning the use of summary fields for efficient filtering on video datasets is also a nice tip.

Let's verify that the recommended indexes are being created properly:
#!/bin/bash
# Description: Verify that the recommended indexes are created.

# Test: Search for the `ground_truth.detections.label` index
rg --type python -A 5 $'dataset.create_index("ground_truth.detections.label")'

# Test: Search for the compound index on `group.name` and `detections.detections.label`
rg --type python -A 5 $'dataset.create_index([("group.name", 1), ("detections.detections.label", 1)])'
451-511: Sidebar groups customization is a great addition!

The new "Sidebar groups" section thoroughly explains the feature for visually creating, modifying, and persisting custom sidebar groups in the App.

Being able to configure sidebar groups programmatically through the dataset's App config is also very useful for more advanced customization.

The code examples are clear and easy to follow.

One suggestion for improvement:

It would be helpful to clarify what happens to dataset fields that are not explicitly assigned to any sidebar group. Are they automatically added to a default group?

Let's check if this behavior is documented anywhere:
#!/bin/bash
# Description: Check for documentation on behavior of fields not assigned to sidebar groups.

# Test: Search for any comments related to the behavior of unassigned fields
rg --type python -i $'fields that do not appear in the dataset\'s sidebar_groups'
docs/source/user_guide/using_datasets.rst (2)

1636-1679: LGTM!

This is a clear and informative introduction to summary fields in FiftyOne. The explanations of the concept and benefits are easy to understand. The examples showing how to create summary fields for categorical and numeric input fields are helpful.

1757-1810: Looks good!

This section provides a nice set of examples demonstrating how to manage summary fields:

Listing summary fields with list_summary_fields()

Checking if summary fields need updating with check_summary_fields()

Updating summary fields with update_summary_field()

Deleting summary fields with delete_summary_field() and delete_summary_fields()

The explanations are clear and the code snippets are easy to understand.

docs/source/user_guide/app.rst

swheaton · 2024-09-17T04:13:59Z

Thank you for the detailed review @minhtuev 🙌🏼

brimoor · 2024-09-17T13:15:12Z

If we delete all summary fields, does the summary sidebar group disappear automatically?

If a sidebar group becomes empty, it will remain in the App sidebar with an X button that the user can click to remove the group if they want, so its easy to get rid of.

coderabbitai

Actionable comments posted: 1

Outside diff range and nitpick comments (2)

fiftyone/core/dataset.py (2)

1660-1734: Add a code comment explaining the purpose of _SUMMARY_FIELD_KEY.

The create_summary_field function looks good overall and handles the creation and population of summary fields based on the provided configuration.

One suggestion is to add a code comment explaining the purpose and usage of the _SUMMARY_FIELD_KEY info key, which is used to identify and store metadata about summary fields. This will make the code more self-explanatory and easier to maintain.

1915-1917: Add a comment explaining the purpose of unwinding list fields.

Consider adding a code comment to explain why the pipeline unwinds list fields using the $unwind stage. This will make it clearer to future readers that unwinding is necessary to handle summary fields that are nested within list fields in the dataset.

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6996af5 and 685fbf0.

Files selected for processing (1)

fiftyone/core/dataset.py (12 hunks)

Additional comments not posted (7)

fiftyone/core/dataset.py (7)

1647-1658: LGTM!

The list_summary_fields function looks good. It retrieves the summary fields by filtering the dataset's field schema for fields with the _SUMMARY_FIELD_KEY info key, sorts the field names, and returns them as a list.

1894-2037: Efficient use of aggregation pipelines to compute summary field values.

The _populate_summary_field function makes good use of MongoDB aggregation pipelines to efficiently compute the summary field values based on the provided configuration. The pipelines handle various scenarios, such as categorical and numeric fields, sample-level and frame-level fields, and optional grouping and counting.

Using aggregation pipelines allows for optimized computation of the summaries directly in the database, avoiding the need to load all the data into memory.

2038-2050: Effective use of last modified timestamps to determine summary field updates.

The check_summary_fields function implements a useful mechanism to determine which summary fields may need to be updated based on the last modified timestamps of the samples and frames in the dataset.

By comparing the timestamps, the function identifies summary fields that were last generated before the most recent modifications to the dataset, indicating that they might be outdated and need to be refreshed.

This approach helps optimize the summary field update process by only updating the necessary fields instead of regenerating all summaries unconditionally.

2086-2107: Appropriate handling of updating a specific summary field.

The update_summary_field function provides a straightforward way to update a single summary field based on the current values of its source field.

The function takes the necessary steps to ensure data integrity and consistency:

It reloads the dataset to mitigate potential concurrency issues.

It validates that the provided field name corresponds to an existing summary field.

It updates the last_modified_at timestamp to reflect the update operation.

It saves the updated field document, allowing the read-only status to be overridden.

It calls the _populate_summary_field method to recompute the summary field values.

Overall, the function handles the update process effectively and ensures that the summary field is refreshed with the latest data.

2109-2120: Straightforward wrapper function for deleting a summary field.

The delete_summary_field function serves as a simple wrapper around the _delete_summary_fields method, providing a convenient way to delete a single summary field from all samples in the dataset.

It directly passes the provided field_name and error_level arguments to the underlying method, allowing for consistent behavior and error handling.

The function's purpose is clear, and its implementation is concise and readable.

2122-2133: Straightforward wrapper function for deleting multiple summary fields.

The delete_summary_fields function serves as a simple wrapper around the _delete_summary_fields method, providing a convenient way to delete multiple summary fields from all samples in the dataset.

It allows for flexibility by accepting either a single field name or an iterable of field names through the field_names argument. This enables the caller to delete one or more summary fields in a single operation.

The function directly passes the field_names and error_level arguments to the underlying method, ensuring consistent behavior and error handling.

Overall, the function's purpose is clear, and its implementation is concise and readable.

2135-2155: Robust implementation for deleting summary fields with error handling.

The _delete_summary_fields function provides a robust implementation for deleting summary fields from all samples in the dataset. It handles various scenarios and incorporates error handling based on the provided error_level.

Key aspects of the function:

It converts the field_names argument to a list, allowing for consistent handling of single field names and iterables.

It validates each field name to ensure it corresponds to an existing summary field. If a field is not a valid summary field, it handles the error based on the error_level using the fou.handle_error function.

It sets the read_only attribute of

fiftyone/core/dataset.py

* Added support for summary fields which allow the users to collect frame-level information at the sample level * Added documentation and unit tests for adding, querying, and deleting summary fields * Updated retrieving field schema by info_keys and created_after * Updated drop_index to not throw error if the index does not exist Co-authored-by: brimoor <brimoor@umich.edu> Co-authored-by: Stuart Wheaton <stuart@voxel51.com> Co-authored-by: minhtuevo <minhtuev@voxel51.com>

swheaton changed the base branch from develop to last-modified-at September 3, 2024 21:05

swheaton force-pushed the feat/frame-field-rollup-for-video-datasets branch from c4a19d3 to 24025b2 Compare September 3, 2024 21:10

minhtuev commented Sep 3, 2024

View reviewed changes

fiftyone/core/dataset.py Show resolved Hide resolved

minhtuev commented Sep 10, 2024

View reviewed changes

tests/unittests/import_export_tests.py Outdated Show resolved Hide resolved

minhtuev commented Sep 10, 2024

View reviewed changes

fiftyone/core/fields.py Show resolved Hide resolved

brimoor force-pushed the last-modified-at branch from 455c9b8 to cab2429 Compare September 12, 2024 03:55

minhtuev changed the title ~~[Draft] Frame field rollup for video datasets~~ Frame field rollup for video datasets Sep 12, 2024

minhtuev marked this pull request as ready for review September 12, 2024 22:47

brimoor mentioned this pull request Sep 13, 2024

Adding created_at, last_modified_at, and read-only fields #4597

Merged

swheaton and others added 8 commits September 13, 2024 16:02

Fields have created_at attribute

f4de231

minor

90c2c57

frame field rollup (frame-to-sample index)

8e49a64

Added and removed index from roll up field

6192f9c

Added unit tests

25f98ed

Updated doc and added quiet mode to drop_index

bef4741

Returned the rollup_field_path and updated tests

5b59322

Updated test

42cc12f

swheaton force-pushed the feat/frame-field-rollup-for-video-datasets branch from 4f256e0 to 42cc12f Compare September 13, 2024 20:03

minhtuevo added 2 commits September 13, 2024 15:41

1. If include_counts==True, validate that frame field must be of type…

c2729ea

… fo.StringField 2. Index created for counts mode must have classifications.label 3. Updated docstring

Added unit tests

562b2e1

minhtuev requested a review from swheaton September 14, 2024 00:32

brimoor added 2 commits September 15, 2024 01:07

generalizing implementation

05ac364

updating docstrings

05d63c9

brimoor changed the title ~~Frame field rollup for video datasets~~ Add support for inverted indexes Sep 15, 2024

minhtuev commented Sep 15, 2024

View reviewed changes

fiftyone/core/collections.py Show resolved Hide resolved

Base automatically changed from last-modified-at to develop September 15, 2024 15:53

Rename from inverted_index to frame_summary

59ce0f2

coderabbitai bot reviewed Sep 16, 2024

View reviewed changes

tests/unittests/dataset_tests.py Show resolved Hide resolved

brimoor changed the title ~~Add support for inverted indexes~~ Add support for summary fields Sep 16, 2024

brimoor added 2 commits September 16, 2024 10:55

adding method to update existing summary fields

7545506

must reload all

2b25199