Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to get actual file casing in path #14321

Open
Tracked by #64596
ellismg opened this issue Mar 4, 2015 · 25 comments
Open
Tracked by #64596

Add API to get actual file casing in path #14321

ellismg opened this issue Mar 4, 2015 · 25 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@ellismg
Copy link
Contributor

ellismg commented Mar 4, 2015

Per @jasonwilliams200OK in dotnet/coreclr#390

Based on this answer: http://stackoverflow.com/a/81493/1712065 (further redirected from: http://stackoverflow.com/a/326153).

Please add the ability to retrieve path with actual case via FileInfo and DirectoryInfo classes. The candidate member being FullPath and Name. Perhaps there is some sophisticated way of getting it from win32 file system API, but that seems to be a working solution.

Expected:

using System.IO.FileInfo;
// ..
// actual path was:
// C:\\SharePoint\\scripts\\MasterDeployment.ps1
FileInfo info = new FileInfo("c:\\sharepoint\\scriPts\\maSTerdeploymnet.PS1");
Console.WriteLine(info.FullPath);
// C:\\SharePoint\\scripts\\MasterDeployment.ps1
Console.WriteLine(info.Name);
// MasterDeployment.ps1

Actual result:

using System.IO.FileInfo;
// ..
// actual path was:
// C:\\SharePoint\\scripts\\MasterDeployment.ps1
FileInfo info = new FileInfo("c:\\sharepoint\\scriPts\\maSTerdeploymnet.PS1");
Console.WriteLine(info.FullPath);
// c:\\sharepoint\\scriPts\\maSTerdeploymnet.PS1
Console.WriteLine(info.Name);
// maSTerdeploymnet.PS1
@ghost
Copy link

ghost commented Mar 4, 2015

NTFS is case-sensitive, its Windows which makes it look bad. :)

Thank you @ellismg. But honestly I opened this issue in coreclr repo because I think the solution should come from lowest possible API, instead of C# manipulation of canonical paths, or making repeated expensive IO operations from a distance. I still standby and vote for the coreclr option.

No solution is better than pure managed solution (like present), so people don't start relying on bad-performing algorithm out of ignorance.

CoreCLR solution = C/C++ = fast performance
CoreFX solution = .NET runtime = not so fast performance

@ellismg
Copy link
Contributor Author

ellismg commented Mar 5, 2015

@jasonwilliams200OK For .NET Core, the source for the lowest level path APIs are in the CoreFX repository. The implementation in mscorlib.dll in CoreCLR is not what we expose as part of .NET core and will be removed if possible.

I don't fully understand your comment about CoreFX solution having worse performance than one in CoreCLR. In either case, we would likely implement the feature the same way, by PInvoking to the relevant OS APIs. The native portions of the runtime are not used for any of the File APIs that exposed to managed code.

@ghost
Copy link

ghost commented Mar 6, 2015

@ellismg, thanks for expanding on the general workflow here.

But unfortunately, this particular issue focuses on the limitation of Windows API, where it does not emit the actual casing to the application tier. To overcome this problem, we have two options in .NET:

  • Tap into native code and read this low-level struct: SHGetFileInfo, like this. Super fast solution.
  • Stay managed, stay green and do not think native: Walk though the path and join the pieces recursively, like this. Tremendously slow compared to the native solution, with little stress testing the magnitude difference can easily be observed.

This is my understanding.

I hope there exists a better way (O(1)) to query the file-system for the exact path for .NET Core that I am unaware of, otherwise recursive walking will be a perf. nightmare.

@ghost
Copy link

ghost commented Mar 6, 2015

One crazy way to solve it quickly is shelling out cmd commands:

e:\> c:
c:\> cd c:\sharepoint\scriPts
C:\SharePoint\scripts> echo %cd%\MasterDeployment.ps1
:: returns C:\SharePoint\scripts\MasterDeployment.ps1

or one liner:

c: && cd c:\sharepoint\scriPts && echo %cd%\MasterDeployment.ps1

Update:

The above solution was not resolving the filename. So here is a working solution:

e:\> c:
c:\> cd c:\sharepoint\scriPts
C:\SharePoint\scripts> set var1=%cd%
C:\SharePoint\scripts> for /f "delims=" %A in ('dir maStErdeployment.pS1 /B') do set "var=%A"
dif maStEreployment.pS1 /B
C:\SharePont\scripts> echo %var1%\%var2%
:: returns C:\SharePoint\scripts\MasterDeployment.ps1

Note it will not work, had you issued dir c:\sharepoint\scriPts\maStErdeployment.PS1 /B from e:\, because it will only resolve the true case of filename. You need to cd into the directory first to capture the output of echo %cd%, then store the output of filename.

But issuing shell commands is a bad workaround.

@ghost
Copy link

ghost commented Jul 2, 2015

With dotnet/corefx#2219, I added GetActualCasing to File and Directory. Should we instead wait for CoreFx Path?

As you can see in dotnet/corefx@36b4113, the body of methods vary because filename does not get normalized by underlying native method, so we have to do little extra work to get the "filename part" casing correct in case of File.

@ellismg, I didn't get any feedback on this for months, so I decided to take a stab. Is there another smarter way to skip this part, as this is not a full-fledge API but a simple (but tricky) convenient method?

@ghost
Copy link

ghost commented Sep 10, 2015

@JeremyKuhne, since I have noticed you had been involved in some path-related features, can you please review this one? Do you have any objections or suggestions on implementing this feature like 36b4113? Please feel free to criticize, I would love to have this functionality in CoreFX someday.

@terrajobst terrajobst assigned JeremyKuhne and unassigned ericstj Sep 15, 2015
@terrajobst
Copy link
Member

@JeremyKuhne, it seems Roslyn had a similar request for a an API that gets the canonical path, i.e. resolves sym links, gets the correct casing etc. What's your take?

@stephentoub
Copy link
Member

I'm not clear on what such an API would mean in a typical Unix context, where file AbC is unrelated to file ABc. It'd just return the original supplied string without modification?

public static string GetActualCasing(string path) { return path; }

?

@JeremyKuhne
Copy link
Member

@jasonwilliams200OK, @terrajobst: We do need stuff along these lines, what I was going to suggest was Path.GetFinalPath to match the semantics we're already using (e.g. aligning with the File Management APIs in Windows). Essentially it would just be a call to GetFinalPathNameByHandle() on Windows and realpath() on Linux. Symbolic links will be resolved and the file will have to exist.

@stephentoub I think this api/behavior would address what you're bringing up. I don't think there is a way to specify files with non canonical casing even when using case-insensitive file systems (NTFS) on Unix- but I'm certainly not positive. Anyone know?

As far as not resolving sym links I'm not sure the best available APIs to make this happen and what we would call it. I suppose if we had a Path.GetCanonicalPath instead of GetFinalPath (matching Linux/Java semantics) we could add an overload to not resolve sym links?

Note that on Windows you can find the right casing by walking DirectoryInfo/FileInfo objects. The file name that comes back from FindFirst/NextFile is always in the right case.

@ghost
Copy link

ghost commented Sep 16, 2015

@JeremyKuhne, during this exercise: dotnet/corefx@36b4113 one thing I figured out was that we can get correct casing with Interop.mincore.GetLongPathName, which is pretty performant IMO (compared to walking down the [Directory/File]Info slugs). The only part it doesn't correct is the filename in case of File, for which I used FileInfo (only for the last, filename part): File.cs#L360. For Directory, this workaround is not required.
I didn't handle the Unix case, which can just return the real part as you suggested.

@JeremyKuhne
Copy link
Member

Hmm- I hadn't seen that behavior when I was playing with GetLongPathName- I'll take a closer look. The Windows file system folks mentioned that there was a way to get the correct cased name without following sym links- I'll ping them again to see if I can drag out details.

@JeremyKuhne
Copy link
Member

@jasonwilliams200OK - The thing I was missing was FILE_FLAG_OPEN_REPARSE_POINT. GetFinalPathNameByHandle will give you the real canonical name on whatever file you open. If you pass the flag above when calling CreateFile it won't follow the reparse points and will, instead, open the actual link.

@ghost
Copy link

ghost commented Sep 17, 2015

@JeremyKuhne, thanks for the info! And good to know that Win32 API provides the functionality. :)

Would it make sense to change the behavior of existing [Directory/File]Info.[Full]Name methods, to make them emit actual casing? IMO, this breaking change would only affect those scenarios, which consumers are making case-sensitive path comparison which doesn't happen very often in production code targeting Windows environment.

This way, we wouldn't be needing an additional method / property for this; only fixing the behavior of FullName and Name in DirectoryInfo and FileInfo for Windows.

@JeremyKuhne
Copy link
Member

@jasonwilliams200OK The Name property should already be in the proper case. My only concern would be adding hidden perf impact as we'd have to make a call to CreateFile, then GetFinalPathNameByHandle, then clean up the prefixing. I suppose we could optimize that away when we're starting from an Info class so probably not too bad...

I'll dig in a bit more as I get a chance.

@karelz
Copy link
Member

karelz commented Oct 11, 2016

We need formal API proposal. Anyone wants to pick it up?

@karelz karelz changed the title Feature Request - Add ability to get actual file casing in path Add API to get actual file casing in path Nov 23, 2016
@carlreinke
Copy link
Contributor

carlreinke commented Oct 16, 2017

The Java equivalent of this API request is Path.toRealPath(...). It is important that it be possible to get the properly-cased path without resolving symbolic links (an option which Java provides).

Although, browsing the Java source code, it's not clear to me that it looks up properly-cased filenames on a case-insensitive filesystem (ex. vfat) in Linux... and I'm not certain that it's actually possible. Even realpath gives back the same casing that it is given rather than the actual casing of the filename.

@danmoseley
Copy link
Member

@carlreinke if you would like to make an API proposal (just picking on you since perhaps you have an interest) the process is here:

https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md

It would probably be clearerest to close this and make a new issue for the proposal, since then it can be top posting.

@ghost
Copy link

ghost commented Mar 3, 2018

Original proposal is too specific to GetActualCasing, which will only benefit case-insensitive file system.

If we think of what else is missing then combining it with features of UNIX realpath will make sense.

Proposal:

public static class Path
{
    public static string GetRealPath ( string path );
    public static string GetRealPath ( ReadOnlySpan<char> path );
    public static string GetRealPath ( ReadOnlyMemory<char> path );
}

On case-sensitive file systems, GetRealPath resolves path if it is a symlink until the first non-symlink path is found. On case-insensitive file systems, it will do the same and additionally it will return the correct casing.


Either this superset one of https://github.com/dotnet/corefx/issues/25569 or https://github.com/dotnet/corefx/issues/24685, or the vice versa fold this functionality.

@JeremyKuhne
Copy link
Member

Triage: We want to do this, but there are questions around performance (as we have to walk the full path) and what to do for paths that don't exist or are not accessible. We also need to understand what to do with drive letters and validate casing requirements for device paths (e.g. \\.\Volume....). Effectively we need GetCanonicalCasingForExistingPath but with a better name.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@carlossanlop carlossanlop modified the milestones: 5.0.0, Future Jun 18, 2020
@iSazonov
Copy link
Contributor

PowerShell needs this and currently uses a workaround GetCorrectCasedPath()
https://github.com/PowerShell/PowerShell/blob/1e940f55b1f265c9640d622937383208e13a849b/src/System.Management.Automation/namespaces/FileSystemProvider.cs#L124

@carlossanlop
Copy link
Member

If an OS allows using case-insensitive paths, why should it be considered a problem if the specified path string does not have the exact same casing as the actual path? It is allowed behavior. What would be the benefit of fixing this?

This comment in the linked PowerShell issue offers a related explanation:

Well, it's not strictly a functional problem, because it only applies to platforms with case-insensitive file systems, notably macOS and Windows - on Linux, with its case-sensitive file system you have to supply the case-accurate representation to begin with, otherwise you won't find the file / directory.

Also, suggesting a cross-platform API to get the real path would not make sense in Unix, as it was pointed out here.

@PathogenDavid
Copy link
Contributor

Also, suggesting a cross-platform API to get the real path would not make sense in Unix, as it was pointed out here.

Beyond the extent of any APIs it can provide to help this issue, the OS should really not be a consideration here. Case sensitivity is an attribute of the file system, not the operating system. Apple's file systems have always supported both due to legacy Mac OS's case sensitivity. EXT4 supports toggling case insensitivity at the directory level. Linux (and notably
WSL) has to deal with NTFS all the time. Heck, everything can mount FAT volumes which are case-insensitive.

If an OS allows using case-insensitive paths, why should it be considered a problem if the specified path string does not have the exact same casing as the actual path? It is allowed behavior. What would be the benefit of fixing this?

  • Detecting when something is taking advantage of the case-insensitive file system when it shouldn't because it needs to be cross-platform.
  • Logging the canonical casing of a path when the user provided the wrong casing.
  • Scenarios when file paths come from more than one location.

The scenario that prompted me to follow this issue in the first place is largely the third problem, but I've run into all three.

A project I'm currently working on involves parsing C++ code using the Clang compiler. From a high level view, one aspect of that process looks like this:

  1. Ask the compiler to compile a bunch of files (which is done by compiling a single file which #includes them all for technical reasons)
  2. Get a big tree of syntax nodes back
  3. Work through those syntax nodes and if they correspond to one of the original input files, process them

The "if they correspond" bit is the source of some whacky edge-case bugs right now. The main issue is that the paths I get back from Clang might not be the same paths I originally put in because files can be included more than once in C++. Currently we assume everything is case-insensitive, and this will do the right thing in 99.9% of cases. If you want to read about those 0.1% of cases I did a more detailed quick-and-dirty writeup of the issues we face here: MochiLibraries/Biohazrd#1 (comment)

I definitely see this API as something for solving weird edge cases. You should not normally need it, but when you do it's annoying to get right. The processing of paths has always been a huge source of edge case bugs in software (see countless security issues caused by directory traversal attacks, the need for Path.Combine, or all the little edge cases Path.GetFullPath deals with), so I think this is something best provided by the BCL.

@iSazonov
Copy link
Contributor

iSazonov commented Oct 3, 2020

What would be the benefit of fixing this?

In PowerShell repository main motivation was "users want nice output" like Windows Explorer does. It was approved and implemented by PowerShell MSFT team.

Also, suggesting a cross-platform API to get the real path would not make sense in Unix

As mentioned it is file system behavior. And we should consider both local scenario (mount NTFS on Unix and ext4 on Windows) and remote scenario (I mean PowerShell can connect to remote computer - what is an expected behavior in the case?).

/cc @mklement0

@MichalPetryka
Copy link
Contributor

A good usecase for this API (maybe even the most important) are interactions with tools like git, which are case sensitive even on case insensitive filesystems.
And as said before, the API should be crossplatform as you can use case insenvitive filesystems like FAT even on Linux.

@DL444
Copy link

DL444 commented Sep 20, 2023

I was in a situation where I needed to perform some operations on a bunch of files except those specified by the user. On case-insensitive file systems, exceptions were to be matched case-insensitively, i.e. "file" would match "file", "File", and "FILE". On case-sensitive file systems, exceptions needed to match exactly, i.e. "file" would only match "file" and not "File" or "FILE".

I want to point out that the GetFileSystemInfos() trick referenced many times in the discussion is quite problematic. GetFileSystemInfos(string searchPattern) is an overload for GetFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) with enumerationOptions.MatchCasing set to PlatformDefault. So it wouldn't work if the default case sensitivity for the OS does not match the case sensitivity for the specific file system used.

For example, macOS is case-insensitive by default, and GetFileSystemInfos(string searchPattern) performs case-insensitive enumeration. But macOS also supports case-sensitive APFS volumes. Calling GetFileSystemInfos("file") on a case-sensitive volume that contains "File" and "FILE" will give you exactly those two files, neither of which is what you expected and which one comes first is undefined. This can easily cause data corruption if you are not careful.

Windows NTFS also supports this, enabled per-directory with fsutil file setCaseSensitiveInfo. I assume this is mostly for WSL, but it being used outside of WSL is still a possibility to account for if your application runs everywhere.

Most common Linux filesystems are case-sensitive by default. But there's a diverse ecosystem of file systems out there for Linux, I wouldn't be surprised if one of them is case-insensitive (take NTFS on Linux).

Therefore I would say assuming case sensitivity of the OS is probably not a robust design. Depending on your use case, you might be able to get away with an explicitly case-insensitive GetFileSystemInfos() followed by a [File/Directory].Exists() and reason over the item's existence and how many matches you get to infer case sensitivity of the file system and act accordingly. But I imagine this wouldn't perform well, and someone could pull the rug from under you by modifying the directory in between those two method calls. Therefore I wouldn't call this a proper solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests