Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DT_UKNOWN cannot be ignored and is being mishandled #10

Open
mabrowning opened this issue Mar 1, 2024 · 0 comments
Open

DT_UKNOWN cannot be ignored and is being mishandled #10

mabrowning opened this issue Mar 1, 2024 · 0 comments

Comments

@mabrowning
Copy link

I have been using locar to synchronize and manipulate data on a huge vast cluster which can only be operated at full I/O speed with lots of parallel transactions. By quickly parallelizing over my deep and wide directory structure, I was able to get speedups just as you advertise...

BUT. I just discovered that locar is sometimes missing a huge 30%-90% of the files.

Turns out, it appears that the vast NFS implementation only fills in DT_TYPE in the dirents structure for the first ~10,000 entries in a directory, and locar does not correctly handle this case (it unfortunately needs to fall back to stat() with the extra syscall). See https://stackoverflow.com/a/39430337/381313 for a description of the caveats of using DT_TYPE in dirents.

$ ls directory| wc -l
13142

$ ./locar_linux_amd64 directory | wc -l
...
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9997.h5 iNode<10168721505490461340>[type:unknown(0)]
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9998.h5 iNode<11256784254947026052>[type:unknown(0)]
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9999.h5 iNode<14804106816623306678>[type:unknown(0)]
10692

$ ./locar_linux_amd64 directory  -all | wc -l
13142

This means locar may be failing to descend into some directories, though it seems that at least in my case all these DT_UNKNOWN entries are indeed regular files, permitting use of --all as a workaround for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant