Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library cannot generate manifest entries for or validate bags with hidden (.) payload files #51

Closed
ssciolla opened this issue Jul 8, 2024 · 2 comments · Fixed by #52

Comments

@ssciolla
Copy link
Contributor

ssciolla commented Jul 8, 2024

Hi, we seem to have discovered today when validating a bag generated by another BagIt implementation (bagit-python) that hidden files -- or those starting with ., such as .keep -- are not handled by this library. Essentially, the bag included hidden files in the manifest and payload, but the library didn't find them, resulting in a failed completeness check. I believe the following line is the source of the issue. Because it's also used by the manifest! method, hidden files would also not be included in the manifest when the bag is generated by the library.

bagit/lib/bagit/bag.rb

Lines 39 to 41 in 4a7fb6d

def bag_files
Dir[File.join(data_dir, "**", "*")].select { |f| File.file? f }
end

A fix would likely involve using Dir.glob with a specific pattern that would include these; the Ruby docs suggest using '{*,.*}', so for this library we might use '**/{*,.*}'. Dir[] might also still work with the updated pattern. There is a File::FNM_DOTMATCH flag, but it would also cause . and .. to be included (which we don't want).

In reviewing the spec, I don't see any indication that these files should be ignored or not included. If people seem to agree this is an issue, I'm happy to open a PR with some added/updated tests. If I'm incorrect somewhere in this analysis, I'm also happy to be pointed in the right direction.

Resources

Update: In thinking more about this, this may be considered a breaking change for current users of the library, as the bags they previously generated using the library would now be considered invalid by the "fixed" version. That would have to be handled somehow, assuming this change is a wanted improvement.

@little9
Copy link
Contributor

little9 commented Jul 16, 2024

@ssciolla Thank you for reporting this! I agree with your reasoning and a PR would be awesome.

Maybe we could add some metadata indicating that the bag has been created by a new version of the library in bag-info.txt and change the validation operation if it is the newer version?

Happy to keep discussing this with you.

@little9 little9 pinned this issue Jul 16, 2024
@ssciolla
Copy link
Contributor Author

Thanks for the response, @little9! I will start working on a PR.

Your idea about keying off the version could work. My colleague also found a discussion in bagit-java about hidden files from years ago. I think this comment is interesting: LibraryOfCongress/bagit-java#31 (comment)

We could play it safe for the immediate future by adding an option to Bag that indicates whether to consider or look for hidden files, defaulting to false. Then you could have a minor release with this change, and then maybe change the default in a later major release (if there's a will to do that). You may also need some logic to handle validation when the setting is false, but there are hidden files in the manifest, maybe just raising warnings there. I'm going to start with this approach, but keep your idea in mind as well.

little9 added a commit that referenced this issue Jul 31, 2024
Add optional hidden file detection (#51)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants