Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: a way to tell if two filepaths point to the same file #17873

Open
Tracked by #64596
ljw1004 opened this issue Jul 18, 2016 · 11 comments
Open
Tracked by #64596

Request: a way to tell if two filepaths point to the same file #17873

ljw1004 opened this issue Jul 18, 2016 · 11 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@ljw1004
Copy link

ljw1004 commented Jul 18, 2016

On unix, the recommended way to tell whether two filepaths point to the same file is to use stat and compare st_dev, st_ino.

http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/reference.html#equivalent

On Windows, the recommended way is to use the win32 API GetFileInformationByHandle and compare nFileIndexLow, nFileIndexHigh, dwVolumeSerialNumber. You have to do this via p/invoke because there's no .NET wrapper for it.

P/invoke was tolerable to me because it was all self-contained and easy. But as I port my code over to corefx, I honestly can't be bothered to go the whole nuget platform-specific native binary route. It's far too much work.

I'd love it you could add an API to corefx to judge whether two filenames point to the same file. Judging by the huge number of requests for this on stackoverflow, it seems like a common bread-and-butter scenario.

@karelz
Copy link
Member

karelz commented Oct 11, 2016

Related to #14321

We need API proposal

@JeremyKuhne
Copy link
Member

Perhaps

bool System.IO.Path.HaveSameTarget(string path1, string path2)

Behavior:

  • Throws ArgumentNullException for null or empty paths
  • Throws relevant I/O exceptions if it cannot open a handle
  • Follows links (does not open handles on the links themselves)
  • Works with files or directories
  • Works with locked files

We may also want to consider exposing this for existing SafeFileHandles. Perhaps bool SafeFileHandle.HasSameTarget(SafeFileHandle fileHandle).

@danmoseley
Copy link
Member

@JeremyKuhne, @ianhays do you consider this ready for API review?

@am11
Copy link
Member

am11 commented Feb 14, 2017

Please consider both APIs: checking the equality based on real-path and getting the real-path for given path:

public static class Directory
{
    // resolves symlink etc.
    public static string GetRealPath(string path);

    // resolves symlink and performs equality check
    public static bool HaveSameTarget(string path1, string path2);
}

public static class File
{
    // resolves symlink etc.
    public static string GetRealPath(string path);

    // resolves symlink and performs equality check
    public static bool HaveSameTarget(string path1, string path2);
}
  • HaveSameTarget is proposed in Directory and File types because existing System.IO.Path APIs are mostly doing canonical operations (no real FS / IO intervention). Therefore, Directory, File, DirectoryInfo and FileInfo seem to be good candidates for this method. However, static methods are conventionally added to Directory and File types.
  • GetRealPath will help us keeping track of recursive symlink dotnet/corefx#16094
  • GetRealPath will also fold the use-case of Add API to get actual file casing in path #14321, if method return real casing on case-insensitive filesystems.

@ljw1004
Copy link
Author

ljw1004 commented Feb 14, 2017

What exactly is the definition of "Real Path"? You wrote "resolves symlinks etc" but there's a lot buried in that...

  • On unix if you have two hard links then there's no unique real path. Likewise in Windows if you have junction points.
  • On Windows if you use "subst" on your hard drive then for instance x:\file.txt and c:\directory\file.txt might both be the same file, but I don't know which of them is the more "real".
  • On Windows \?\C:\directory\file.txt and c:\directory\file.txt both point to the same file but I don't know which name is more "real"
  • On Windows you might have a network-path with \?\ or a subst path like x:\file.txt, both pointing to the same file, but I don't know which is real.
  • If you have a recursive symlink, what should GetRealPath do?

I think the concept of a "real path" is far too woolly. It certainly can't be used as the implementation technique for HaveSameTarget. The best you could do is have a function called "ResolveSymlinks" (but I think that API is questionable -- the reason users use symlinks is because they want applications to see a particular directory structure, and they don't want applications second-guessing them).

@ianhays
Copy link
Contributor

ianhays commented Feb 14, 2017

Before we mark this as ready for review we should update the issue with a more concrete API proposal including a finalized API, some code examples, and justification for real-world usage. I think Jeremy's post is on the right track for desired behavior, we just need something more formal before we should move forward.

@am11
Copy link
Member

am11 commented Feb 14, 2017

there's a lot buried in that

Agree on that. I am not an expert but I think the API can define reasonable constraints. I thought HaveSameTarget can make use of GetRealPath under the hood.

I think that API is questionable

On the contrary, I think there really is a need of parsing the real path in .NET (especially Core) as it is provided by almost all other language frameworks.

PHP http://php.net/manual/en/function.realpath.php
Perl http://perldoc.perl.org/Cwd.html
Python https://docs.python.org/2/library/os.path.html#os.path.realpath
Ruby http://apidock.com/ruby/Pathname/realpath
node.js https://nodejs.org/api/fs.html#fs_fs_realpath_path_options_callback

Symlinks are used more commonly on Unix systems and hence more pain/complaints from that side of the house: dotnet/coreclr#2128.

On .NET, even the Windows-only P/Invoke based solutions such as this one are quite complicated to implement IMO.

@ljw1004
Copy link
Author

ljw1004 commented Feb 14, 2017

"I thought HaveSameTarget can make use of GetRealPath under the hood." -- no it can't! I gave examples where the correct solution (inode &c.) will correctly claim that two files are the same, but GetRealPath (as defined in PHP and Python and nodejs) will claim they're different.

@JeremyKuhne
Copy link
Member

I'll chime in more later when I have more time. I'll point out #14321 again as it discusses overlapping issues.

One initial thought is that we have to think about multiple drivers (such as mounted network shares) in how we define anything. I don't know that it is possible to say that two files definitively aren't the same if they don't share the same actual device (e.g. \Device\Harddisk0\ on Windows).

@JeremyKuhne
Copy link
Member

This issue has come up in other forums such as Twitter/StackOverflow

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@hamarb123
Copy link
Contributor

I would like this api also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

9 participants