Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4605 match.params aren't URI decoded - further test #6578

Closed

Conversation

RichardBradley
Copy link

@RichardBradley RichardBradley commented Feb 8, 2019

Hi,

I think that #4605 is still not fixed, as this test demonstrates.

I will attempt to fix, but I can't guarantee I will succeed. I think the fix needs to be in history, and that the path component decoding function needs to be decodeURIComponent, rather than decodeURI. For that to work, we need to rearrange the parsing so that decoding happens after parsing (I think this commit does it before parsing), otherwise the parse will be ambiguous.

I think that the current code shows confusion around types, e.g. the line I linked to above pathname = decodeURI(pathname) shows that the distinction between encoded URIs and values used as path parameters is not being properly maintained, which has led to this bug IMO. Decoded path parameters are of a different type to encoded URIs, so the two should not be stored in the same variable without inviting confusion and bugs.

Thanks,

Rich

@RichardBradley
Copy link
Author

RichardBradley commented Feb 8, 2019

I have prepared a fix, which I have added to this PR.

I don't think that the URI decoding relevant to this bug should be happening in history at all, and I think that the decodeURI that was added in remix-run/history#442 is quite suspicious and may be incorrect. (Maybe try reverting it and see if the tests still pass after this commit?)

I believe that the history module should pass the paths to react-router, including any necessary URI encoding that is required for them to be valid URIs. When react-router extracts path params from these URIs, it should apply decodeURIComponent, because the values are moving from a URI typed domain to a string typed domain. I have done this in the attached commit (1967020) and all tests now pass.

Hope this helps,

Rich

@RichardBradley
Copy link
Author

RichardBradley commented Feb 8, 2019

This will also fix #5607 and #5472, but I can't comment on those as they are both "locked" (why?)

@pshrmn
Copy link
Contributor

pshrmn commented Apr 2, 2019

With history v5, we will be removing pathname decoding. Both v4 and v5 of history will remain compatible with RRv5 (and v4). With the React Router and history v6 releases, we will be switching to non-decoded pathnames (they should always be encoded, but the user will be responsible for providing the correct encoding), but for the time being it will be possible for pathnames to be either encoded or non-encoded. I bring this up for two reasons:

  1. I worry that we cannot safely call decodeURIComponent on the params until we are certain that the pathname is encoded. The primary edge case for this would be if the decoded pathname has a percent sign (either a valid octet that gets double decoded or an invalid octet that throws an error).
decodedLocation.pathname = "/message/%20 is the space octet"
<Route path="/message/:message" />
params = { message: "  is the space octet" }
  1. We should provide a way for people using history v5 to have decoded params. Encoded pathnames is the safe choice, but having to constantly decode params yourself is annoying.

I haven't discussed this with any of the other maintainers, but I think that it might be beneficial to add a decode prop to the <Route> (only for v5), which would signal that we can safely decode the params.

<Route path="/artist/:name" decode ... />

That should solve the problem in this PR while letting anyone that knows that they have params that are unsafe to decode to skip the decoding. (The inverse solution of providing a prop to not decode is also possible)

v6 will expect pathnames to be encoded, so we should probably be able to safely decode params there.

Thoughts?

@RichardBradley
Copy link
Author

I don't fully understand your comment. My initial reaction is that it doesn't really make sense, and you are trying to shoe-horn two incompatible concepts (URLs and some kind of more general "path") into a single library, which is causing you irreconcilable type errors.

I think you need to 1) actually specify what /type/ is held in the various values here and 2) enforce that consistently (which is hard in an untyped language). I think you are building on sand by proposing a user-specified "encoding" which may or may not be compatible with URL encoding.

In the meantime, I do think the proposed patch here is a) correct, b) an improvement on what is currently in the codebase and c) fixes a real bug. So I would suggest merging this patch and moving more theoretical discussions about "the user will be responsible for providing the correct encoding" in future versions to a different thread.

@pshrmn
Copy link
Contributor

pshrmn commented Apr 2, 2019

I will move the decoding issue elsewhere, but I'm against merging this PR at this point. It might be the correct decision in v6, but we'll cross that bridge when we get there.

@RichardBradley
Copy link
Author

Are you saying that the path params should never be decoded?
So for a route like <Route path="/:parentId/:childId", a URL like /a%2fb/c should give values {parentId: "a%2fb", childId: "c"}?

I think that is unconventional to the point of being wrong, but I suppose it could work if it is documented and applied consistently.

Currently the codebase is somewhere half way between the two viewpoints.

@pshrmn
Copy link
Contributor

pshrmn commented Apr 2, 2019

I think that a param should be decoded. My worry is that if we call decodeURIComponent on a string that is already decoded, we run the risk of double decoding an octet-like string.

This is all stemming from the history v4 implementation where a pathname is a decoded string. Any percent-encoded octets in the string should not be considered encoded (in the param %20 is the space octet, %20 is a string containing three characters, not the octet for a single space character). If we call decodeURIComponent on that string, the octet will be converted to a space ( is the space octet).

I really just don't want to further break things by adding automatic decoding unless we can be certain that it will give the correct output. Once all pathnames are encoded, we will know that any percent-encoded octets are "real" octets, so decoding would be safe.

Maybe I'm overthinking it; v4's behavior would already make the %20 is... pathname buggy, so it is possible that automatic decoding wouldn't break anything that isn't already broken. 🤷‍♂️

@RichardBradley
Copy link
Author

Maybe I'm overthinking it; v4's behavior would already make the %20 is... pathname buggy, so it is possible that automatic decoding wouldn't break anything that isn't already broken. 🤷‍♂️

I think so.

I would bet that everyone is using "web safe" path params, because they can't rely on the current behaviour. I don't think you need to worry about backwards-compatibility here, as there is clearly a bug, as evidenced by #4605, this issue and the unit tests I added.

@pshrmn
Copy link
Contributor

pshrmn commented Apr 2, 2019

I'm still slightly hesitant about blanket decoding every param. It is probably the ideal default behavior, but I wonder if anyone would want the ability to keep their params encoded. I'll leave the decision of whether to merge this as is or allow non-decoded params up to the other maintainers.


As an aside, I think that the title of this PR could be more clear. Something like "automatically decode params using decodeURIComponent" or "decode param characters not decoded by decodeURI". The latter describes what the PR fixes today, while the former describes what RR will want to do once pathnames are encoded.

// characters not decoded by decodeURI
; , / ? : @ & = + $ #

@RichardBradley
Copy link
Author

I wonder if anyone would want the ability to keep their params encoded

I can't think of any reason why they might want this.
This doesn't make sense to me at all. There is no option in JSON.parse to keep the fields "encoded"; why should a URL decomposing lib offer an option to incorrectly decompose a URL?

I'll leave it with you.

As an aside, I think that the title of this PR could be more clear. Something like "automatically decode params using decodeURIComponent" or "decode param characters not decoded by decodeURI". The latter describes what the PR fixes today, while the former describes what RR will want to do once pathnames are encoded.

I disagree, I think "match.params aren't URI decoded" is a reasonable description of the bug.

Part of the confusion might be that the builtin function "decodeURI" is poorly named and not very useful, for historical reasons.
Notice that the MDN docs for decodeURI do not mention any RFCs or explain when it is valid to use this function :-(

This SO explains some of the differences: https://stackoverflow.com/a/3608791/8261

"decodeURIComponent"/"encodeURIComponent" is correct for URL path parameters, which is the context we are discussing.

@stale
Copy link

stale bot commented Sep 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 10, 2019
@RichardBradley
Copy link
Author

I really hate "stale bot".

This is a real bug in react-router. Even though no-one is currently working on it at present, I don't think this issue should be closed.

@stale stale bot removed the stale label Sep 11, 2019
@max-degterev
Copy link

max-degterev commented Sep 11, 2019

What is the recommended workaround for this issue? Just noticed it upgrading from v3 to v5

@loganvolkers
Copy link

@suprMax We found a solution!

The biggest cause of this bug was actually pointed earlier in this thread. Huge thank you and props to @RichardBradley for the deep digging to find the baneful use of decodeURI.

I don't think that the URI decoding relevant to this bug should be happening in history at all, and I think that the decodeURI that was added in remix-run/history#442 is quite suspicious and may be incorrect. (Maybe try reverting it and see if the tests still pass after this commit?)

It appears that @pshrmn didn't really understand the URI specification and has introduce this situation.

The workaround is to:

  1. Manually encodeURI the path before sending it to StaticRouter. Otherwise your URL will be partially decoded only some of the time.
  2. Manually decode match.params

I've got a sandbox showing this off.

vpstuart pushed a commit to EBMSPTYLTD/history that referenced this pull request May 19, 2020
decodeURI decodes paths in a very strange way. This should hopefully solve remix-run#786.

This will need to be coupled with a PR in react-router remix-run/react-router#6578.
@vpstuart
Copy link

Submitted PR on history project remix-run/history#797 that I'm hoping is the other half of this issue.

I am in favor of this PR, as it causes some weird gaps in support for %-encoded path segments for me which at the moment might force a home-grown replacement for a project, which I'd really rather avoid.

@ryanflorence
Copy link
Member

Thanks for everybody's work on this. I'm closing issues that are old, if they are still problems please open a new issue and reference this one. We will be a lot more attentive now that our software is our sole focus and we survived 2020 😅. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants