-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: add Manifest::intersect_manifest
to Rust core
#3305
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## latest #3305 +/- ##
==========================================
- Coverage 86.53% 86.49% -0.04%
==========================================
Files 137 137
Lines 16023 16047 +24
Branches 2728 2728
==========================================
+ Hits 13865 13880 +15
- Misses 1849 1858 +9
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Extracted from #3305 This PR adjusts `RevIndex::open` to propagate a `RocksDBError` when opening a non-RocksDB directory. It also modifies the creation printout in `RevIndex::create` to distinguish index completion from interim output, for slightly easier debugging.
@luizirber @bluegenes ready for review! |
Manifest::intersect_manifest
to Rust core
src/core/src/collection.rs
Outdated
@@ -215,6 +215,14 @@ impl Collection { | |||
assert_eq!(sig.signatures.len(), 1); | |||
Ok(sig) | |||
} | |||
|
|||
pub fn intersect_manifest(&self, mf: &Manifest) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be simpler (or complementary) to enable intersect of the collections directly i.e. pass a ref to the Collection (other
) and then use self.manifest.intersect_manifest(other.manifest)
?
I guess it probably doesn't matter as long as we're passing in a reference to either the collection or manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comes directly from the MultiCollection
code use case, where we read a manifest in directly; would be frustrating to have to create a Collection
first.
we might be able to coerce a Collection
into a Manifest
, though, which would make it work more smoothly when you do have a Collection
... I'll look!
src/core/src/collection.rs
Outdated
let manifest = self.manifest.intersect_manifest(mf); | ||
Self { | ||
manifest, | ||
storage: self.storage.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does just cloning the storage mean that we're carrying around all sigs anyway, but we just don't have access to them via the manifest?
This is probably best since we avoid having to find and copy sigs individually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is really a question for @luizirber at the moment. The alternative would involve references and lifetime annotations, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does just cloning the storage mean that we're carrying around all sigs anyway, but we just don't have access to them via the manifest?
This is probably best since we avoid having to find and copy sigs individually?
To be clear: I think the answer is yes, esp if it's e.g. a MemStorage
.
I am worried about duplicating memory unnecessarily. I don't really know how cloning a storage works in practice. I don't like the idea of cloning (which presumably would double the MemStorage
).
Maybe a better approach here would be to have the result of Collection::intersect_manifest
take ownership of storage without a clone. I'd need to change the signature of intersect_manifest
here, will think on't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemStorage
is the odd one, because it does allocate a lot of new memory, but Storage in general doesn't store much (or any) data, mostly metadata to access the actual data.
@luizirber any strong objections to the code here? |
@luizirber @bluegenes can we merge this? :) |
(not sure why codecov is acting up... tests do cover that code?!) |
I'm ok with merging 🤷🏼♀️ |
Sometimes cargo-tarpaulin doesn't get everything, I've tried #2993 some time ago, maybe time to revisit |
🎉 |
from #2987 (comment): - [x] verify minimum supported rust version (MSRV) for writing release notes (see #2988 for an example); MSRV is checked by CI in `.github/workflows/rust.yml`, `minimum_rust_version` - [x] write release notes using `git log --oneline r0.12.0..latest src/core | cut -d\ -f2- > /tmp/out.txt` - [x] verify that the ChangeLog is up to date: https://github.com/sourmash-bio/sourmash/blob/latest/src/core/CHANGELOG.md - [x] bump version in `src/core/Cargo.toml` # Release notes: ## [0.15.2] - 2024-09-25 MSRV: 1.65 Changes/additions: * add `Manifest::intersect_manifest` to Rust core (#3305) * propagate error from `RocksDB::open` on bad directory (#3306, #3307) Updates: * Bump getset from 0.1.2 to 0.1.3 (#3328) * Bump memmap2 from 0.9.4 to 0.9.5 (#3326) * Bump codspeed-criterion-compat from 2.6.0 to 2.7.2 (#3324) * Bump serde_json from 1.0.127 to 1.0.128 (#3316) * Bump serde from 1.0.209 to 1.0.210 (#3318) * Bump serde from 1.0.208 to 1.0.209 (#3310) * Bump serde_json from 1.0.125 to 1.0.127 (#3309)
This PR implements
Manifest::intersect_manifest
andCollection::intersect_manifest
for the Rust layer, which is needed to support standalone manifests over in sourmash-bio/sourmash_plugin_branchwater#430.As part of this, the PR implements
Eq
andHash
traits forRecord
so thatHashSet
can be used for efficient intersections.Related PRs: