Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

digest subset should not short-circuit when there are ignores involved. #12648

Merged
merged 4 commits into from
Sep 6, 2021

Conversation

kaos
Copy link
Member

@kaos kaos commented Aug 25, 2021

Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se>
Copy link
Member Author

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: these are the first lines of Rust I've ever written, so I have no idea what I'm doing :P

.subset(merged_digest, subset_params4)
.await
.unwrap();
assert_eq!(subset_roland4, snapshot1.digest);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails for some reason, and I've not figured out yet how to inspect the snapshot to see if it is correct or not..

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot manually implements Debug to hide that information, but it doesn't need to. To fix that, can apply a diff like:

diff --git a/src/rust/engine/fs/store/src/snapshot.rs b/src/rust/engine/fs/store/src/snapshot.rs
index a94f033e7..1c0c79bf4 100644
--- a/src/rust/engine/fs/store/src/snapshot.rs
+++ b/src/rust/engine/fs/store/src/snapshot.rs
@@ -19,7 +19,7 @@ use itertools::Itertools;
 
 use crate::Store;
 
-#[derive(Eq, Hash, PartialEq)]
+#[derive(Eq, Debug, Hash, PartialEq)]
 pub struct Snapshot {
   pub digest: Digest,
   pub path_stats: Vec<PathStat>,
@@ -305,17 +305,6 @@ impl Snapshot {
   }
 }
 
-impl fmt::Debug for Snapshot {
-  fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
-    write!(
-      f,
-      "Snapshot(digest={:?}, entries={})",
-      self.digest,
-      self.path_stats.len()
-    )
-  }
-}
-
 fn paths_of_child_dir(paths: Vec<PathStat>) -> Vec<PathStat> {
   paths
     .into_iter()

Then, to dump out the Snapshot, you could use:

// In standard debug format.
println!("snapshot: {:?}", snapshot);`
// In pretty-printed debug format:
println!("snapshot: {:#?}", snapshot);`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this test passes, yay :D

.subset(merged_digest, subset_params3)
.await
.unwrap();
assert_eq!(subset_roland3, snapshot1.digest);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, as it should.

@@ -310,6 +310,20 @@ impl GitignoreStyleExcludes {
::ignore::Match::Ignore(_) => true,
}
}

pub fn has_ignored_paths(&self, path: &Path) -> bool {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to is_ignored_or_child_of_ignored_path ? If not, it'd be good to understand why not...

Copy link
Member Author

@kaos kaos Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, is_ignored_or_child_of_ignored_path is the inverse (but not logical not) of has_ignored_paths. That is, it answers, if there are any exclude patterns touching path or a child of path, where as the former is for if path is covered by any exclude pattern.

exclude pattern path is_ignored_or_child_of_ignored_path has_ignored_paths
a/ a/b/ yes yes*
a/b/ a/ no yes
a/ b/ no no

* I think my implementation gives a no here, which is wrong.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think that that makes sense. Renaming the method to match the pattern of the others (maybe is_parent_of_ignored_path or something), and adding a bit of explanation in a docstring would be good.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -310,6 +310,20 @@ impl GitignoreStyleExcludes {
::ignore::Match::Ignore(_) => true,
}
}

pub fn has_ignored_paths(&self, path: &Path) -> bool {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think that that makes sense. Renaming the method to match the pattern of the others (maybe is_parent_of_ignored_path or something), and adding a bit of explanation in a docstring would be good.


pub fn has_ignored_paths(&self, path: &Path) -> bool {
match path.to_str() {
None => return false,
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case would be for non-UTF8 path data... haven't seen it before, and am not sure how likely it actually is. But you might consider renaming this method to indicate uncertainty (i.e. maybe_parent_of_ignored_path), and then have this case return true, because we can't be sure.

@@ -425,17 +425,19 @@ impl IntermediateGlobbedFilesAndDirectories {
if (*wildcard == *DOUBLE_STAR_GLOB) || (*wildcard == *SINGLE_STAR_GLOB) {
// Here we short-circuit all cases which would swallow up a directory without
// subsetting it or needing to perform any further recursive work.
let has_ignores = exclude.has_ignored_paths(&directory_path);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a relatively expensive method: maybe move calling it down to: if short_circuit && !exclude.has_ignored_paths(&directory_path) { .. }. Could then rename the let short_circuit flag to let is_short_circuit_pattern.

.subset(merged_digest, subset_params4)
.await
.unwrap();
assert_eq!(subset_roland4, snapshot1.digest);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot manually implements Debug to hide that information, but it doesn't need to. To fix that, can apply a diff like:

diff --git a/src/rust/engine/fs/store/src/snapshot.rs b/src/rust/engine/fs/store/src/snapshot.rs
index a94f033e7..1c0c79bf4 100644
--- a/src/rust/engine/fs/store/src/snapshot.rs
+++ b/src/rust/engine/fs/store/src/snapshot.rs
@@ -19,7 +19,7 @@ use itertools::Itertools;
 
 use crate::Store;
 
-#[derive(Eq, Hash, PartialEq)]
+#[derive(Eq, Debug, Hash, PartialEq)]
 pub struct Snapshot {
   pub digest: Digest,
   pub path_stats: Vec<PathStat>,
@@ -305,17 +305,6 @@ impl Snapshot {
   }
 }
 
-impl fmt::Debug for Snapshot {
-  fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
-    write!(
-      f,
-      "Snapshot(digest={:?}, entries={})",
-      self.digest,
-      self.path_stats.len()
-    )
-  }
-}
-
 fn paths_of_child_dir(paths: Vec<PathStat>) -> Vec<PathStat> {
   paths
     .into_iter()

Then, to dump out the Snapshot, you could use:

// In standard debug format.
println!("snapshot: {:?}", snapshot);`
// In pretty-printed debug format:
println!("snapshot: {:#?}", snapshot);`

Signed-off-by: Andreas Stenius <andreas.stenius@svenskaspel.se>

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Member Author

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the full tests ./cargo test without fail, so seems like there's not any tests I've broken with this, at least.


if exclude.maybe_is_parent_of_ignored_path(&directory_path) {
// leave this directory in todo_directories, so we process ignore patterns correctly..
continue;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure about this one... but it did make my test pass. See next comment.

// leave this directory in todo_directories, so we process ignore patterns correctly..
continue;
}

if *wildcard == *DOUBLE_STAR_GLOB {
globbed_directories.insert(directory_path, directory_node.clone());
} else if !globbed_directories.contains_key(&directory_path) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really make sense if what is going on here. As it seems to entirely side step any ignore patterns that may be in effect. Also, I was hitting the else here, adding an empty directory. But that directory was never populated with files, and I haven't figured out yet, when and where that was supposed to happen.

So, by simply not doing any of this, my tests passed, as the directories where processed in full.. maybe this is just an optimization?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the skipping recursing into subdirectories based on wildcards is optimization. To the extent that it is buggy, we should bias toward correctness.

I had hoped that we would be able to use the same code here as we use for glob matching against a filesystem, but the primary blocking issue for that is exactly that this code is trying to shortcircuit and skip recursing into things (i.e. "match an entire subdirectory"), and that's not an operation that we can do in the filesystem (due to symlinks, etc).

It's possible that the implementations could be merged, but in the meantime: removing some optimization to ensure that this is correct is the only course.

.subset(merged_digest, subset_params4)
.await
.unwrap();
assert_eq!(subset_roland4, snapshot1.digest);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this test passes, yay :D

@kaos kaos requested a review from stuhood September 3, 2021 14:19
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

Still in draft state: maybe accidentally?

src/rust/engine/fs/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +327 to +328
// In case the pattern is shorter than path, we are inside a ignored tree, so both
// parent and child of ignored paths.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to say "so both parent and child are ignored paths"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, perhaps that sounds better, but I meant to say "of ignored paths".. maybe a bit backwards way of looking at it, I'm ok with either ;)

src/rust/engine/fs/store/src/snapshot_ops.rs Outdated Show resolved Hide resolved
@kaos kaos marked this pull request as ready for review September 3, 2021 20:49
kaos and others added 2 commits September 3, 2021 22:50
Co-authored-by: Stu Hood <stuhood@gmail.com>
Co-authored-by: Stu Hood <stuhood@gmail.com>
@stuhood stuhood enabled auto-merge (squash) September 3, 2021 21:30
@kaos
Copy link
Member Author

kaos commented Sep 6, 2021

bump. retrigger tests?
failed on some java compiler test, feels unrelated. (I'll need this in the Docker backend too.. :P )

@tdyas
Copy link
Contributor

tdyas commented Sep 6, 2021

bump. retrigger tests?

Done.

failed on some java compiler test, feels unrelated. (I'll need this in the Docker backend too.. :P )

The Java tests are flaky and were skipped in a more recent commit. You should rebase the PR to pick up those skips.

@stuhood stuhood merged commit 8936121 into pantsbuild:main Sep 6, 2021
jsirois added a commit to jsirois/pants that referenced this pull request Sep 10, 2021
…uild#12778))

* Add python-docx to the module mapping dictionary ([pantsbuild#12775](pantsbuild#12775))

* Add python-pptx to the module mapping dictionary ([pantsbuild#12776](pantsbuild#12776))

* Add `opencv-python` to the default Python module mapping ([pantsbuild#12777](pantsbuild#12777))

* Add `PyMuPDF` to the default Python module mapping ([pantsbuild#12774](pantsbuild#12774))

* Deprecate `--list-provides` option. ([pantsbuild#12759](pantsbuild#12759))

* Upgrade default `isort` to latest `isort==5.9.3` ([pantsbuild#12756](pantsbuild#12756))

* `OutputPathField.value_or_default()` no longer has an `Address` argument ([pantsbuild#12837](pantsbuild#12837))

* Properly include file dependencies in docker build context ([pantsbuild#12758](pantsbuild#12758))

* DigestSubset should not short-circuit when there are ignores involved. ([pantsbuild#12648](pantsbuild#12648))

* Improve cache reuse for `./pants package` when using a constraints file or lockfile ([pantsbuild#12807](pantsbuild#12807))

* Upgrade to Pex 2.1.48 and leverage packed layout. ([pantsbuild#12808](pantsbuild#12808))

* Fix backports of std lib modules like `dataclasses` not working with dependency inference ([pantsbuild#12818](pantsbuild#12818))

* Warn if `[python-repos]` is set during lockfile generation ([pantsbuild#12800](pantsbuild#12800))

* Add `version` to lockfile metadata headers ([pantsbuild#12788](pantsbuild#12788))

* jvm: add missing space to avoid two words running together ([pantsbuild#12793](pantsbuild#12793))

* Fix a markdown issue in a help string. ([pantsbuild#12766](pantsbuild#12766))

[ci skip-rust]
@kaos kaos mentioned this pull request Sep 12, 2021
@kaos kaos deleted the digest_subset_glob_exclude_bug branch October 5, 2021 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PathGlobs bug when using an exclude and **
3 participants