Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Windows path prefix parsing #94887

Merged
merged 2 commits into from
Apr 23, 2022
Merged

Improve Windows path prefix parsing #94887

merged 2 commits into from
Apr 23, 2022

Conversation

dylni
Copy link
Contributor

@dylni dylni commented Mar 12, 2022

This PR fixes improves parsing of Windows path prefixes. parse_prefix now supports both types of separators on Windows (/ and \).

@rust-highfive
Copy link
Collaborator

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 12, 2022
@rust-log-analyzer

This comment has been minimized.

@dylni
Copy link
Contributor Author

dylni commented Mar 12, 2022

@rustbot label: +O-windows

@rustbot rustbot added the O-windows Operating system: Windows label Mar 12, 2022
library/std/src/sys/windows/path/absolute.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/path/absolute.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/path/absolute/tests.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/path/absolute/tests.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/path/absolute/tests.rs Outdated Show resolved Hide resolved
library/std/src/sys/windows/path/absolute.rs Outdated Show resolved Hide resolved
@dylni
Copy link
Contributor Author

dylni commented Mar 15, 2022

@ChrisDenton Thanks for the thorough review. It seems that the changes to path::absolute should be unnecessary, but it would be helpful to add a comment about the initial buffer size mitigating the WinAPI bug. I'll update the PR in the coming days.

What's your opinion about changing parse_prefix? The current implementation ensures that the prefix will be valid in all contexts, but it causes Path::join to miss some prefixes. The current behavior might be better, since I don't see why a Unix path separator would be used for a Windows-specific prefix.

@dylni dylni marked this pull request as draft March 15, 2022 03:54
@ChrisDenton
Copy link
Member

Hm, I don't think there's an ideal solution but on balance I do think it would be better to stick to the current behavior for the sake of avoiding inconsistencies when using the path. The effect it has on Path::join is unfortunate but, as you say, in practice it should at least be rarely (if ever) encountered.

@dylni dylni changed the title Move "normpath" crate impl to libstd Improve Windows path parsing, similarly to "normpath" Mar 19, 2022
@dylni
Copy link
Contributor Author

dylni commented Mar 19, 2022

@ChrisDenton I think the new implementation will fix both problems. //?/ would now be parsed as a device path.

@dylni dylni changed the title Improve Windows path parsing, similarly to "normpath" Improve Windows path parsing Mar 19, 2022
@dylni dylni changed the title Improve Windows path parsing Improve Windows path prefix parsing Mar 19, 2022
@rust-log-analyzer

This comment has been minimized.

@dylni dylni marked this pull request as ready for review March 19, 2022 14:12
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@ChrisDenton ChrisDenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing //?/ as a device path looks like a good solution to me.

I only have a couple of potential performance notes but nothing essential.

library/std/src/sys/windows/path.rs Outdated Show resolved Hide resolved
Comment on lines 310 to 316
// NULs in verbatim paths are rejected for consistency.
if path.bytes().contains(&0) {
return Err(io::const_io_error!(
io::ErrorKind::InvalidInput,
"strings passed to WinAPI cannot contain NULs",
));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this only be done if the prefix is actually verbatim? Otherwise this check is essentially done twice in the non-verbatim case. It probably doesn't matter too much though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm conflicted about this. The newest implementation is likely the most performant, but maybe it would make more sense to always call to_u16s to avoid code duplication.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ChrisDenton
Copy link
Member

I apologise for the delay here. Since it's been a little while, would you be able to rebase on a recent master just to be sure there are no conflicts? And run library/std tests if possible?

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Apr 17, 2022
@dylni
Copy link
Contributor Author

dylni commented Apr 17, 2022

@ChrisDenton No problem. I've rebased, and all tests now pass. For clarity, the paths will be parsed as UNC paths, not device paths, since Prefix::DeviceNS has a very specific prefix. I also made a minor documentation update.

The second commit is new and removes an unnecessary function containing unsafe code. I can remove the commit if it should be done in a separate PR.

Latest Diff for First Commit
diff --git a/library/std/src/path.rs b/library/std/src/path.rs
index 8ecea8ce07f..c03d197e019 100644
--- a/library/std/src/path.rs
+++ b/library/std/src/path.rs
@@ -168,8 +168,8 @@ pub enum Prefix<'a> {
 
     /// Device namespace prefix, e.g., `\\.\COM42`.
     ///
-    /// Device namespace prefixes consist of `\\.\` immediately followed by the
-    /// device name.
+    /// Device namespace prefixes consist of `\\.\` (possibly using `/`
+    /// instead of `\`), immediately followed by the device name.
     #[stable(feature = "rust1", since = "1.0.0")]
     DeviceNS(#[stable(feature = "rust1", since = "1.0.0")] &'a OsStr),
 
diff --git a/library/std/src/path/tests.rs b/library/std/src/path/tests.rs
index d1f59d2786e..0d8ea29c2be 100644
--- a/library/std/src/path/tests.rs
+++ b/library/std/src/path/tests.rs
@@ -971,15 +971,15 @@ pub fn test_decompositions_windows() {
     file_prefix: None
     );
 
-    t!("\\\\?\\C:/foo",
-    iter: ["\\\\?\\C:/foo"],
+    t!("\\\\?\\C:/foo/bar",
+    iter: ["\\\\?\\C:", "\\", "foo/bar"],
     has_root: true,
     is_absolute: true,
-    parent: None,
-    file_name: None,
-    file_stem: None,
+    parent: Some("\\\\?\\C:/"),
+    file_name: Some("foo/bar"),
+    file_stem: Some("foo/bar"),
     extension: None,
-    file_prefix: None
+    file_prefix: Some("foo/bar")
     );
 
     t!("\\\\.\\foo\\bar",
diff --git a/library/std/src/sys/windows/path.rs b/library/std/src/sys/windows/path.rs
index 7bf6f4a9b1f..cbe6d1630e1 100644
--- a/library/std/src/sys/windows/path.rs
+++ b/library/std/src/sys/windows/path.rs
@@ -127,14 +127,14 @@ pub fn parse_prefix(path: &OsStr) -> Option<Prefix<'_>> {
                 Some(VerbatimUNC(server, share))
             } else {
                 let path = parser.finish();
-                let (prefix, _) = parse_next_component(path, true);
 
                 // in verbatim paths only recognize an exact drive prefix
-                if let Some(drive) = parse_drive_exact(prefix) {
+                if let Some(drive) = parse_drive_exact(path) {
                     // \\?\C:
                     Some(VerbatimDisk(drive))
                 } else {
                     // \\?\prefix
+                    let (prefix, _) = parse_next_component(path, true);
                     Some(Verbatim(prefix))
                 }
             }
@@ -166,24 +166,24 @@ pub fn parse_prefix(path: &OsStr) -> Option<Prefix<'_>> {
 }
 
 // Parses a drive prefix, e.g. "C:" and "C:\whatever"
-fn parse_drive(prefix: &OsStr) -> Option<u8> {
+fn parse_drive(path: &OsStr) -> Option<u8> {
     // In most DOS systems, it is not possible to have more than 26 drive letters.
     // See <https://en.wikipedia.org/wiki/Drive_letter_assignment#Common_assignments>.
     fn is_valid_drive_letter(drive: &u8) -> bool {
         drive.is_ascii_alphabetic()
     }
 
-    match prefix.bytes() {
+    match path.bytes() {
         [drive, b':', ..] if is_valid_drive_letter(drive) => Some(drive.to_ascii_uppercase()),
         _ => None,
     }
 }
 
 // Parses a drive prefix exactly, e.g. "C:"
-fn parse_drive_exact(prefix: &OsStr) -> Option<u8> {
+fn parse_drive_exact(path: &OsStr) -> Option<u8> {
     // only parse two bytes: the drive letter and the drive separator
-    if prefix.len() == 2 { parse_drive(prefix) } else { None }
+    if path.bytes().get(2).map(|&x| is_sep_byte(x)).unwrap_or(true) {
+        parse_drive(path)
+    } else {
+        None
+    }
 }
 
 // Parse the next path component.
diff --git a/library/std/src/sys/windows/path/tests.rs b/library/std/src/sys/windows/path/tests.rs
index 0bf5bbe4883..8656b04e4f4 100644
--- a/library/std/src/sys/windows/path/tests.rs
+++ b/library/std/src/sys/windows/path/tests.rs
@@ -108,7 +108,7 @@ fn test_parse_prefix_verbatim() {
 
 #[test]
 fn test_parse_prefix_verbatim_device() {
-    let prefix = Some(Prefix::DeviceNS(OsStr::new("?")));
+    let prefix = Some(Prefix::UNC(OsStr::new("?"), OsStr::new("C:")));
     assert_eq!(prefix, parse_prefix(r"//?/C:/windows/system32/notepad.exe"));
     assert_eq!(prefix, parse_prefix(r"//?/C:\windows\system32\notepad.exe"));
     assert_eq!(prefix, parse_prefix(r"/\?\C:\windows\system32\notepad.exe"));

@ChrisDenton
Copy link
Member

Ok, I've been testing with the new handling for //?/ and I do think it is an improvement on the previous behaviour. While hopefully //?/ paths are never actually used anywhere, if they are then treating them as UNC is better than being seen as root relative paths.

@bors r+

@bors
Copy link
Contributor

bors commented Apr 23, 2022

📌 Commit fb9731e has been approved by ChrisDenton

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 23, 2022
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 23, 2022
@bors
Copy link
Contributor

bors commented Apr 23, 2022

⌛ Testing commit fb9731e with merge 8834629...

@bors
Copy link
Contributor

bors commented Apr 23, 2022

☀️ Test successful - checks-actions
Approved by: ChrisDenton
Pushing 8834629 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 23, 2022
@bors bors merged commit 8834629 into rust-lang:master Apr 23, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 23, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8834629): comparison url.

Summary:

  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: no relevant changes found
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 1 0 0 0 1
mean2 0.5% N/A N/A N/A 0.5%
max 0.5% N/A N/A N/A 0.5%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants