-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
digest subset should not short-circuit when there are ignores involved. #12648
Conversation
Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: these are the first lines of Rust I've ever written, so I have no idea what I'm doing :P
.subset(merged_digest, subset_params4) | ||
.await | ||
.unwrap(); | ||
assert_eq!(subset_roland4, snapshot1.digest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails for some reason, and I've not figured out yet how to inspect the snapshot to see if it is correct or not..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot
manually implements Debug
to hide that information, but it doesn't need to. To fix that, can apply a diff like:
diff --git a/src/rust/engine/fs/store/src/snapshot.rs b/src/rust/engine/fs/store/src/snapshot.rs
index a94f033e7..1c0c79bf4 100644
--- a/src/rust/engine/fs/store/src/snapshot.rs
+++ b/src/rust/engine/fs/store/src/snapshot.rs
@@ -19,7 +19,7 @@ use itertools::Itertools;
use crate::Store;
-#[derive(Eq, Hash, PartialEq)]
+#[derive(Eq, Debug, Hash, PartialEq)]
pub struct Snapshot {
pub digest: Digest,
pub path_stats: Vec<PathStat>,
@@ -305,17 +305,6 @@ impl Snapshot {
}
}
-impl fmt::Debug for Snapshot {
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- write!(
- f,
- "Snapshot(digest={:?}, entries={})",
- self.digest,
- self.path_stats.len()
- )
- }
-}
-
fn paths_of_child_dir(paths: Vec<PathStat>) -> Vec<PathStat> {
paths
.into_iter()
Then, to dump out the Snapshot
, you could use:
// In standard debug format.
println!("snapshot: {:?}", snapshot);`
// In pretty-printed debug format:
println!("snapshot: {:#?}", snapshot);`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this test passes, yay :D
.subset(merged_digest, subset_params3) | ||
.await | ||
.unwrap(); | ||
assert_eq!(subset_roland3, snapshot1.digest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, as it should.
src/rust/engine/fs/src/lib.rs
Outdated
@@ -310,6 +310,20 @@ impl GitignoreStyleExcludes { | |||
::ignore::Match::Ignore(_) => true, | |||
} | |||
} | |||
|
|||
pub fn has_ignored_paths(&self, path: &Path) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent to is_ignored_or_child_of_ignored_path
? If not, it'd be good to understand why not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, is_ignored_or_child_of_ignored_path
is the inverse (but not logical not) of has_ignored_paths
. That is, it answers, if there are any exclude patterns touching path
or a child of path
, where as the former is for if path
is covered by any exclude pattern.
exclude pattern | path | is_ignored_or_child_of_ignored_path | has_ignored_paths |
---|---|---|---|
a/ | a/b/ | yes | yes* |
a/b/ | a/ | no | yes |
a/ | b/ | no | no |
*
I think my implementation gives a no
here, which is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think that that makes sense. Renaming the method to match the pattern of the others (maybe is_parent_of_ignored_path
or something), and adding a bit of explanation in a docstring would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/rust/engine/fs/src/lib.rs
Outdated
@@ -310,6 +310,20 @@ impl GitignoreStyleExcludes { | |||
::ignore::Match::Ignore(_) => true, | |||
} | |||
} | |||
|
|||
pub fn has_ignored_paths(&self, path: &Path) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think that that makes sense. Renaming the method to match the pattern of the others (maybe is_parent_of_ignored_path
or something), and adding a bit of explanation in a docstring would be good.
src/rust/engine/fs/src/lib.rs
Outdated
|
||
pub fn has_ignored_paths(&self, path: &Path) -> bool { | ||
match path.to_str() { | ||
None => return false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case would be for non-UTF8 path data... haven't seen it before, and am not sure how likely it actually is. But you might consider renaming this method to indicate uncertainty (i.e. maybe_parent_of_ignored_path
), and then have this case return true
, because we can't be sure.
@@ -425,17 +425,19 @@ impl IntermediateGlobbedFilesAndDirectories { | |||
if (*wildcard == *DOUBLE_STAR_GLOB) || (*wildcard == *SINGLE_STAR_GLOB) { | |||
// Here we short-circuit all cases which would swallow up a directory without | |||
// subsetting it or needing to perform any further recursive work. | |||
let has_ignores = exclude.has_ignored_paths(&directory_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a relatively expensive method: maybe move calling it down to: if short_circuit && !exclude.has_ignored_paths(&directory_path) { .. }
. Could then rename the let short_circuit
flag to let is_short_circuit_pattern
.
.subset(merged_digest, subset_params4) | ||
.await | ||
.unwrap(); | ||
assert_eq!(subset_roland4, snapshot1.digest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot
manually implements Debug
to hide that information, but it doesn't need to. To fix that, can apply a diff like:
diff --git a/src/rust/engine/fs/store/src/snapshot.rs b/src/rust/engine/fs/store/src/snapshot.rs
index a94f033e7..1c0c79bf4 100644
--- a/src/rust/engine/fs/store/src/snapshot.rs
+++ b/src/rust/engine/fs/store/src/snapshot.rs
@@ -19,7 +19,7 @@ use itertools::Itertools;
use crate::Store;
-#[derive(Eq, Hash, PartialEq)]
+#[derive(Eq, Debug, Hash, PartialEq)]
pub struct Snapshot {
pub digest: Digest,
pub path_stats: Vec<PathStat>,
@@ -305,17 +305,6 @@ impl Snapshot {
}
}
-impl fmt::Debug for Snapshot {
- fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
- write!(
- f,
- "Snapshot(digest={:?}, entries={})",
- self.digest,
- self.path_stats.len()
- )
- }
-}
-
fn paths_of_child_dir(paths: Vec<PathStat>) -> Vec<PathStat> {
paths
.into_iter()
Then, to dump out the Snapshot
, you could use:
// In standard debug format.
println!("snapshot: {:?}", snapshot);`
// In pretty-printed debug format:
println!("snapshot: {:#?}", snapshot);`
Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se> # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the full tests ./cargo test
without fail, so seems like there's not any tests I've broken with this, at least.
|
||
if exclude.maybe_is_parent_of_ignored_path(&directory_path) { | ||
// leave this directory in todo_directories, so we process ignore patterns correctly.. | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure about this one... but it did make my test pass. See next comment.
// leave this directory in todo_directories, so we process ignore patterns correctly.. | ||
continue; | ||
} | ||
|
||
if *wildcard == *DOUBLE_STAR_GLOB { | ||
globbed_directories.insert(directory_path, directory_node.clone()); | ||
} else if !globbed_directories.contains_key(&directory_path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really make sense if what is going on here. As it seems to entirely side step any ignore patterns that may be in effect. Also, I was hitting the else here, adding an empty directory. But that directory was never populated with files, and I haven't figured out yet, when and where that was supposed to happen.
So, by simply not doing any of this, my tests passed, as the directories where processed in full.. maybe this is just an optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the skipping recursing into subdirectories based on wildcards is optimization. To the extent that it is buggy, we should bias toward correctness.
I had hoped that we would be able to use the same code here as we use for glob matching against a filesystem, but the primary blocking issue for that is exactly that this code is trying to shortcircuit and skip recursing into things (i.e. "match an entire subdirectory"), and that's not an operation that we can do in the filesystem (due to symlinks, etc).
It's possible that the implementations could be merged, but in the meantime: removing some optimization to ensure that this is correct is the only course.
.subset(merged_digest, subset_params4) | ||
.await | ||
.unwrap(); | ||
assert_eq!(subset_roland4, snapshot1.digest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this test passes, yay :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
Still in draft state: maybe accidentally?
// In case the pattern is shorter than path, we are inside a ignored tree, so both | ||
// parent and child of ignored paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to say "so both parent and child are ignored paths"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, perhaps that sounds better, but I meant to say "of ignored paths".. maybe a bit backwards way of looking at it, I'm ok with either ;)
Co-authored-by: Stu Hood <stuhood@gmail.com>
Co-authored-by: Stu Hood <stuhood@gmail.com>
bump. retrigger tests? |
Done.
The Java tests are flaky and were skipped in a more recent commit. You should rebase the PR to pick up those skips. |
…uild#12778)) * Add python-docx to the module mapping dictionary ([pantsbuild#12775](pantsbuild#12775)) * Add python-pptx to the module mapping dictionary ([pantsbuild#12776](pantsbuild#12776)) * Add `opencv-python` to the default Python module mapping ([pantsbuild#12777](pantsbuild#12777)) * Add `PyMuPDF` to the default Python module mapping ([pantsbuild#12774](pantsbuild#12774)) * Deprecate `--list-provides` option. ([pantsbuild#12759](pantsbuild#12759)) * Upgrade default `isort` to latest `isort==5.9.3` ([pantsbuild#12756](pantsbuild#12756)) * `OutputPathField.value_or_default()` no longer has an `Address` argument ([pantsbuild#12837](pantsbuild#12837)) * Properly include file dependencies in docker build context ([pantsbuild#12758](pantsbuild#12758)) * DigestSubset should not short-circuit when there are ignores involved. ([pantsbuild#12648](pantsbuild#12648)) * Improve cache reuse for `./pants package` when using a constraints file or lockfile ([pantsbuild#12807](pantsbuild#12807)) * Upgrade to Pex 2.1.48 and leverage packed layout. ([pantsbuild#12808](pantsbuild#12808)) * Fix backports of std lib modules like `dataclasses` not working with dependency inference ([pantsbuild#12818](pantsbuild#12818)) * Warn if `[python-repos]` is set during lockfile generation ([pantsbuild#12800](pantsbuild#12800)) * Add `version` to lockfile metadata headers ([pantsbuild#12788](pantsbuild#12788)) * jvm: add missing space to avoid two words running together ([pantsbuild#12793](pantsbuild#12793)) * Fix a markdown issue in a help string. ([pantsbuild#12766](pantsbuild#12766)) [ci skip-rust]
From slack thread: https://pantsbuild.slack.com/archives/C0D7TNJHL/p1629826664300300
Fixes #12661.