Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::fs::copy fails on NFS volumes on CentOS 7 #75387

Closed
Gaelan opened this issue Aug 11, 2020 · 11 comments · Fixed by #75428
Closed

std::fs::copy fails on NFS volumes on CentOS 7 #75387

Gaelan opened this issue Aug 11, 2020 · 11 comments · Fixed by #75428
Assignees
Labels
C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@Gaelan
Copy link

Gaelan commented Aug 11, 2020

I tried this code:

// On a CentOS 7 VM, where the current directory is an NFS mount containing a file called "a" with any content
use std::fs;

fn main() {
  println!("Hello, world!");
  fs::copy("a", "b").unwrap();
}

I expected to see this happen: The file is successfully copied

Instead, this happened:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 95, kind: Other, message: "Operation not Supported" }', src/main.rs:4:4

Meta

rustc --version --verbose:

Screen Shot 2020-08-11 at 12 08 34 AM

Backtrace

Screen Shot 2020-08-11 at 12 10 16 AM Screen Shot 2020-08-11 at 12 10 56 AM Screen Shot 2020-08-11 at 12 11 59 AM

Apologies for screenshots, I'm running in a VM I can't easily copy/paste out of.

See also rust-lang/rustup#2452, which has a likely explanation of the cause of this.

@Gaelan Gaelan added the C-bug Category: This is a bug. label Aug 11, 2020
@Tavi-Kohn
Copy link

Tavi-Kohn commented Aug 11, 2020

This bug doesn't occur every time, because I think fs::copy remembers if the copy_file_range system call is available. If fs::copy is called with two files on the same XFS filesystem, it determines that the copy_file_range system call failed. It then sets a flag to always fall back to a more generic copy method, which prevents the bug from occurring for subsequent calls.

Code Example
use std::fs::copy;
const NFS_IN: &'static str = "/some/nfs/mount/in";
const NFS_OUT: &'static str = "/some/nfs/mount/out";

const XFS_IN: &'static str = "/some/xfs/mount/in";
const XFS_OUT: &'static str = "/some/xfs/mount/out";
fn main() {
    println!("{:?}", copy(NFS_IN, NFS_OUT)); // Err(Os { code: 95, kind: Other, message: "Operation not supported" })
    println!("{:?}", copy(XFS_IN, XFS_OUT)); // Ok(1)
    println!("{:?}", copy(NFS_IN, NFS_OUT)); // Ok(1)
}

@jonas-schievink jonas-schievink added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 11, 2020
@the8472
Copy link
Member

the8472 commented Aug 11, 2020

This might be a kernel bug or documentation error because EOPNOTSUPP is not listed in the copy_file_range man page as possible error and the kernel even has a warning that this stuff shouldn't happen along with a commit comment that it's the responsibility of the filesystem to perform the fallback.
And indeed NFS does have fallback code

So that's probably fixed in a newer kernel version, but who knows with redhat's frankenkernels.

@the8472
Copy link
Member

the8472 commented Aug 11, 2020

If a kernel update doesn't fix it you could also report this to the distro maintainers too, they might have missed something when backporting patches. I haven't looked at the centos kernel sources though, so that's just a guess.

@Mark-Simulacrum
Copy link
Member

Cc @cuviper @joshtriplett, though not sure if you are the right people to ask about the possible kernel issue mentioned above.

Regardless we will likely need to handle this ourselves as kernel or distro updates will likely be slow.

@the8472
Copy link
Member

the8472 commented Aug 11, 2020

Ok, should be easy enough. @rustbot claim

@the8472
Copy link
Member

the8472 commented Aug 11, 2020

This bug doesn't occur every time, because I think fs::copy remembers if the copy_file_range system call is available.

It only remembers that if it encounters specific error codes (ENOSYS or EPERM but not EOPNOTSUPP). So it should also try copy_file_range on the second attempt and encounter the same error.
Could this be related to automounting?

Can you trace the syscalls of your test program via strace -ff [...] and post the output?

@cuviper
Copy link
Member

cuviper commented Aug 11, 2020

I know that RHEL 7.8 disabled copy_file_range -- see the last note in the 7.8 release notes, section 9.4:

The copy_file_range() call has been disabled on local file systems and in NFS

The copy_file_range() system call on local file systems contains multiple issues that are difficult to fix. To avoid file corruptions, copy_file_range() support on local file systems has been disabled in RHEL 7.8. If an application uses the call in this case, copy_file_range() now returns an ENOSYS error.

For the same reason, the server-side-copy feature has been disabled in the NFS server. However, the NFS client still supports copy_file_range() when accessing a server that supports server-side-copy.

However, I think an EOPNOTSUPP from NFS accidentally leaked through, and will be changed to ENOSYS too:
https://bugzilla.redhat.com/show_bug.cgi?id=1783554

@the8472
Copy link
Member

the8472 commented Aug 11, 2020

Oh that's a mess, so it returns ENOSYS in most cases but in some cases copy_file_range would still succeed? The detection logic behaves overly pessimistic then, but I guess that's ok if it was broken in older centos versions.
I'll treat EOPNOTSUPP like ENOSYS then.

@joshtriplett
Copy link
Member

Yes, treating EOPNOTSUPP as ENOSYS here seems like an appropriate workaround for the kernel bug.

@Tavi-Kohn
Copy link

Tavi-Kohn commented Aug 12, 2020

I've run strace on the test program I wrote earlier.
For two files on an XFS filesystem:

copy_file_range(3, NULL, 4, NULL, 1, 0) = -1 ENOSYS (Function not implemented)

On an NFS filesystem:

copy_file_range(3, NULL, 4, NULL, 1, 0) = -1 EOPNOTSUPP (Operation not supported)

@Tavi-Kohn
Copy link

For some systems (like the one I'm working with), updating is likely not going to happen, and this bug pretty much breaks rustup.

I agree with treating EOPNOTSUPP as ENOSYS. I still don't know precisely why the kernel bug occurs, but I can't think of a reason why this workaround would be problematic.

Future testing for this and related bugs should be done carefully. A simple cargo test setup can fail to catch the bug, due to the previously mentioned checks for copy_file_range in fs::copy.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Sep 5, 2020
…triplett

Workarounds for copy_file_range issues

fixes rust-lang#75387
fixes rust-lang#75446
@bors bors closed this as completed in de921ab Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants