-
-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-46208: Fix the relative path normalization of _Py_normpath() in fileutils.c #30362
Conversation
Only Windows pass the test. I'll check further. |
Please review. |
"//../foo" is an absolute path in POSIX. The ".." component gets resolved by the "//" root, so it normalizes as "//foo". AFAIK, in Linux "//" is functionally equivalent to "/". In Windows, "//" has to be followed by two path components to be a valid root, e.g. "//server/share". A ".." component should be resolved by a UNC root, but this PR introduces a bug: >>> os.path.normpath('//server/share/..')
'\\\\server\\share\\..' It's not important how "//.." and "//server/.." get resolved in Windows, since they're invalid, but they may as well absorb the ".." component if it doesn't complicate the implementation. |
Thanks for the truly helpful informations.
|
Python/fileutils.c
Outdated
@@ -2223,6 +2223,7 @@ _Py_normpath(wchar_t *path, Py_ssize_t size) | |||
wchar_t *pEnd = size >= 0 ? &path[size] : NULL; | |||
wchar_t *p2 = path; | |||
wchar_t *minP2 = path; | |||
int minP2_dotdot_joinable = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments on all of the variable declarations here describing what each one represents. the existing code names of p1, p2, minP2, etc. are not meaningful to reason about and understand an algorithm. This isn't supposed to be assembly code. Lets improve its readability as it gets fixed. :)
Otherwise I think your logic change in here is correct.
Python/fileutils.c
Outdated
@@ -2273,6 +2274,7 @@ _Py_normpath(wchar_t *path, Py_ssize_t size) | |||
*p2++ = *p1++; | |||
minP2 = p2; | |||
lastC = SEP; | |||
minP2_dotdot_joinable = 0; // follow pure python's caluculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you link to the code location of the pure Python code in question?
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
The above quoted example is a bug in the pure Python implementation. The "//" path has been reduced to "/", which changes it from an invalid path to a valid path. The "//" path isn't valid since it translates to the NT path "\??\UNC\". A filesystem is never mounted at the root path of the "UNC" device. The "/" path, on the other hand, is a valid relative path. It's the root path of the drive or UNC share of the current working directory. The standard for correct behavior is WinAPI >>> nt._getfullpathname('//..')
'\\\\'
>>> nt._getfullpathname('//server/..')
'\\\\server\\' |
Could we actually call that instead of using our own implementation? |
No, it's only a standard for absolute paths, since it returns an absolute path. We use it to implement |
FYI, I think that https://gist.github.com/nascheme/553576e857f89083423858f174e23d10 I can make a PR but I have some work to do on it yet. Need to add in the MS Windows behaviour. To me, the C version doesn't match what ntpath does, e.g. with respect to UNC prefixes. I think we need more test cases. |
The pure Python >>> nt._getfullpathname('\\\\?\\C:////spam////eggs. . .')
'\\\\?\\C:\\spam\\eggs'
>>> nt._getfullpathname('\\\\.\\C:////spam////eggs. . .')
'\\\\.\\C:\\spam\\eggs' "\\.\" device paths never bypass normalization, and "\\?\" extended paths bypass normalization only when accessed, e.g. |
I added comments on the variable declarations and fixed BTW, the results of WinAPI below are correct?
|
They're correct in as much as the behavior of Whether ".." is ever a valid server or share name in a UNC path is questionable. Technically, it's an allowed registered name (reg-name, from RFC3986), and "." is an allowed pchar character (disallowed characters: It's clear that [1] Some SMB share names are allowed in Windows even if they're not normal filenames, such as "spam..". This can be dysfunctional. For example, |
Python/fileutils.c
Outdated
p2 = p3 + 1; | ||
} else { | ||
p2 = p3; | ||
if (p2 == minP2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just be able to merge this into the condition at the original line 2295.
Python/fileutils.c
Outdated
@@ -2265,6 +2266,7 @@ _Py_normpath(wchar_t *path, Py_ssize_t size) | |||
} | |||
} | |||
minP2 = p2; | |||
is_absolute = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing this, it seems like we might need to be setting minP2
one character prior here (probably just moving it to p2
every time we copy a SEP
at 2262 but before the increment).
That should leave it pointing at the slash, which is interpreted below as meaning we have an absolute path.
Python/fileutils.c
Outdated
@@ -2273,6 +2275,7 @@ _Py_normpath(wchar_t *path, Py_ssize_t size) | |||
*p2++ = *p1++; | |||
minP2 = p2; | |||
lastC = SEP; | |||
is_absolute = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if we set minP2
here to point at the second slash, it'll act like an absolute path without the need for more state. But that isn't how we currently treat it, though I'm not sure that we treat it right at the moment anyway.
Technically, this is probably best treated like two leading slashes on Windows, but only preserving one segment instead of two. At least, that's based on my reading of the spec. I've never seen these used in real life (apart from Blender using them subtly differently), so I don't know what the correct expectation would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The POSIX spec says the following in 4.13 Pathname Resolution:
If a pathname begins with two successive characters, the first component following the leading characters may be interpreted in an implementation-defined manner, although more than two leading characters shall be treated as a single character.
Is this implying that "//.." actually should not be normalized as "//"? In Linux, a "//" root is effectively the same as "/". For example, os.stat('//..')
queries the root directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what it implies. Linux could be using //
to imply that the following segment is the top-level directory by that name, but if we're following that rule we should normalise all paths starting with //
to start with /
and we don't.
If we're allowing paths to use //
as defined, we shouldn't be doing anything to change the first segment, because we don't know how it's going to be interpreted.
But that's not the release blocking bug we're facing here, and I'm not aware of anyone raising the issue about it before, so let's just get the smallest fix in to unblock the release and can dive into obscure correctness cases later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the posix spec: It's better for us to aim for strict compatibility with prior Python versions than new interpretations of specs without a practical bug being filed about something not matching a spec exactly causing a logic problem that prevents a desirable use case and on which platform(s).
Otherwise behavior change creates churn for all users who have code depending on past Python stdlib existing behaviors (whether they know it or not or explicitly test for it or not). Hyrum's Law applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought it up because it was suggested to shift minP2
past the first component. If so, minP2
would have to be special cased for "//.." if it should normalize as "//", per the existing test case. But using "//x" as the root isn't workable anyway because normalizing "//etc/.." as "//etc" wouldn't match the behavior of Linux.
In Cygwin and MSYS2, "//" is used for UNC paths. In MSYS2, I confirmed that the stat result for "//" is different from the result for "/"; that the stat results for "//", "//..", and "//localhost/.." are all the same; and that the stat result for "//localhost/C$/.." is the same as the result for "//localhost". This is compatible with the current implementation of normpath()
.
The following UNIX stack exchange question discusses current and defunct systems and applications that special case "//".
https://unix.stackexchange.com/questions/256497/on-what-systems-is-foo-bar-different-from-foo-bar
Please check. |
@@ -329,13 +329,22 @@ def test_expanduser_pwd(self): | |||
("/..", "/"), | |||
("/../", "/"), | |||
("/..//", "/"), | |||
("//.", "//"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed: this matches behavior of 3.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me. Anyone else taking a look? Or shall I merge
The Azure Pipelines PR failure is the test_ftplib issue - doesn't have to block this issue. |
For Windows, should there be tests for invalid UNC paths such as "//", "//..", "//../..", "//../../..", "//server", "//server/..", and "//server/../.."? This will help to ensure that future changes never allow an invalid path to be normalized as a valid path. Also, it's not a major problem that should prevent merging, but the way repeated slashes are handled prior to the second component of a UNC path is less than ideal: >>> os.path.normpath('//spam///eggs')
'\\\\spam\\\\eggs'
>>> os.path.normpath('//spam///eggs/..')
'\\\\spam\\\\' This case isn't a valid UNC share, since it's just "//spam", without a share component. However, the repeated slashes start the filepath part and should be reduced to a single backslash. That's what the >>> os.path.abspath('//spam///eggs')
'\\\\spam\\eggs'
>>> os.path.abspath('//spam///eggs/..')
'\\\\spam\\' |
We've never gotten UNC paths correct in every way. They're definitely different now from 3.10, but let's file that as a new bug and try to deal with it before beta. |
Thanks all for reviewing and merging.
I think |
And thanks @neonene for the fix and identifying some followup fixes too! Confirmed fixed in |
https://bugs.python.org/issue46208