Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include option to be used with -f <file> to accept null-character (zero byte) delimiting for file names. #416

Open
aghsmith opened this issue May 25, 2024 · 1 comment

Comments

@aghsmith
Copy link

hashdeep can accept a list of files to be hashed, which can be very useful given its otherwise limited file selection ability.

I have used it with an invocation like:
hashdeep -c md5 -f <(find . -type f )
together with some other filters for find, which in the most part works. However files in Unix like file systems can have \r and \n (carriage-return and line-feed) characters in their names.

Hashdeep is too simple in accepting only line separations as the file name delimiters with the -f option.

A file like:
touch filename$'\r'
cannot be accepted though hashdeep if received through find like this, though hashdeep can handle the file on its own without receiving the file list through an external file list..

Find has an option though to zero separate file names:
find . -type f -print0

Currently hashdeep is unable to process this (unless there is only one file in the list)

It seems like this would be a fairly easy feature to develop and would be very helpful for handling the edge-cases with -f <file>

@aghsmith
Copy link
Author

I asked a question on stackexchange about how to solve my problem before realising that there probably isn't a good solution:
https://stackoverflow.com/questions/78530775/trouble-with-hashdeep-fed-by-find-and-unusual-characters

BTW, the reason I'm not just using hashdeep's own file selection is we have a directory structure that puts snapshot directories in some of the directories we need to hash. We would wind up having hashdeep computing the hashes of essentially the same files more than once, which is time consuming, even if the snapshot related lines of output are removed later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant