Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slashes in URI path #2934

Open
IS4Code opened this issue Oct 6, 2022 · 4 comments
Open

Slashes in URI path #2934

IS4Code opened this issue Oct 6, 2022 · 4 comments

Comments

@IS4Code
Copy link
Contributor

IS4Code commented Oct 6, 2022

I have two general concerns regarding redirection, which (as far as I can tell) are caused by the server running w3id.org and thus are not directly fixable in this codebase:

  • Apache handles multiple or encoded slashes in the URI in a special fashion, but this is not desirable for a redirection service. To fix this, these directives have to be used in the Apache (virtual host) configuration, not in .htaccess:

    AllowEncodedSlashes On
    MergeSlashes Off
    

    Since the target server will most likely have the default behaviour anyway, this is not a security issue. I am considering moving from purl.org to w3id.org, and this is something I would need (purl.org doesn't even merge slashes, it simply outputs "Internal Server Error" and they haven't fixed it yet since I told them months ago).

  • (Already resolved.) w3id seems to redirect HTTP to HTTPS in all situations. I don't think it is necessary, and it doesn't bring much to the security: a potential attacker could hijack the first HTTP request and find out the path, and then determine perfectly where it redirects since this repository is public. Redirecting to HTTPS won't fix anything since the first (any only) request is already insecure.

    I think it would be best to simply honor whichever protocol the client wishes to use, and let it be the concern of the actual redirecting .htaccess or the target server to do otherwise. Or, if that is considered inappropriate, you could decide based on the presence of the Upgrade-Insecure-Requests header which is sent by modern browsers anyway, and do not redirect if it is not present (under the assumption that the client wishes to use and continue using HTTP in that case).

    This is not critical to me, but it seems like a good thing to do. TLS versions come and go, and at some point in the future, old TLS versions could stop being supported by this service, making these "persistent" URIs break for clients that can't use the newer version, and using HTTP is more likely to continue working. After all, requiring HTTPS is again the matter of the target server.

@davidlehn
Copy link
Collaborator

  • Apache handles multiple or encoded slashes in the URI in a special fashion, but this is not desirable for a redirection service. To fix this, these directives have to be used in the Apache (virtual host) configuration, not in .htaccess:
    AllowEncodedSlashes On
    MergeSlashes Off
    

[...]

What are the use cases you are concerned with? What are some example URLs that would be problematic?

There has been a past issue or two with encoded slashes, but I think those were related to trying to put URLs in a path and having regex issues and they just moved to using query params.

In general, changing server level flags is problematic since it will effect every request. And it's quite difficult to determine if changing these flags would have side effects.

For my reference:
https://httpd.apache.org/docs/2.4/mod/core.html#allowencodedslashes
https://httpd.apache.org/docs/2.4/mod/core.html#mergeslashes

  • w3id seems to redirect HTTP to HTTPS in all situations. [...]

Yes, that is intentional. What are the use cases you are concerned with?

I think it would be best to simply honor whichever protocol the client wishes to use, and let it be the concern of the actual redirecting .htaccess or the target server to do otherwise. Or, if that is considered inappropriate, you could decide based on the presence of the Upgrade-Insecure-Requests header which is sent by modern browsers anyway, and do not redirect if it is not present (under the assumption that the client wishes to use and continue using HTTP in that case).

This service predates various modern secure HTTP features. We should likely look into more updates from time to time. Though I think our direction would be to just use ones that force HTTPS for everything.

When this service was started, we decided to be secure by default. I think it's a hard sell to not keep this policy. If clients use the "https" endpoint as was intended, a large class of security issues is handled. I think we may have even discussed not supporting HTTP at all, but left in the global redirect to https mainly for the use case of hand typed URLs. Unfortunately, some people still mint "http" ids in vocabs. I try to correct them when I see this since resolving those URLs just adds extra redirects. But if they are just string ids, it doesn't matter that much.

This is not critical to me, but it seems like a good thing to do. TLS versions come and go, and at some point in the future, old TLS versions could stop being supported by this service, making these "persistent" URIs break for clients that can't use the newer version, and using HTTP is more likely to continue working. After all, requiring HTTPS is again the matter of the target server.

The concern is valid, though the timeline on this is in many many years if not decades. We already partly address the issue by leaving old TLS support enabled, despite potential security issues. TLS checkers complain about it and mark down our "score". Clients hopefully use proper modern connections so it's not a real issue. If clients are using old versions, that's really their issue, they know the risks, but they are hopefully supported for as long as practical.

Looking at the logs for the last ~4M requests, 3.7M are TLS1.3, 300k are TLS1.2, 8 are TLS1.1, and 2k are TLS1. And a quick glace at the old TLS requsts and it looks like most are from a couple real IPs, maybe a few dozen other real ones, and some random "/" and junk requests. So in practice this isn't a large issue at the current time.

@IS4Code
Copy link
Contributor Author

IS4Code commented Oct 7, 2022

What are the use cases you are concerned with? What are some example URLs that would be problematic?

I am hoping to create redirects for something like https://uri4uri.is4.site/uri.html/https://domain/path%3Fquery%23fragment. At the moment, it would change it to https:/domain/path?query%23fragment (merge //) due to MergeSlashes, and I cannot encode it with %2F since AllowEncodedSlashes is not enabled (I get 404).

When this service was started, we decided to be secure by default. I think it's a hard sell to not keep this policy. If clients use the "https" endpoint as was intended, a large class of security issues is handled. I think we may have even discussed not supporting HTTP at all, but left in the global redirect to https mainly for the use case of hand typed URLs. Unfortunately, some people still mint "http" ids in vocabs. I try to correct them when I see this since resolving those URLs just adds extra redirects. But if they are just string ids, it doesn't matter that much.

Fair enough.

@awagner-mainz
Copy link
Contributor

For reference, #2853 is probably the issue davidlehn has mentioned.

@IS4Code IS4Code changed the title Slashes and HTTPS redirection Slashes in URI path Oct 13, 2022
@IS4Code
Copy link
Contributor Author

IS4Code commented Nov 26, 2022

I've settled on a workaround for MergeSlashes (see #3010) by parsing %{THE_REQUEST} and extracting the raw path before it could be modified. However for AllowEncodedSlashes I don't think a workaround could be made, since the request is outright rejected. It's not perfect but still better than purl.org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants