Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve client IP detection #534

Closed
docelic opened this issue Jan 16, 2018 · 12 comments · Fixed by #564
Closed

Improve client IP detection #534

docelic opened this issue Jan 16, 2018 · 12 comments · Fixed by #564

Comments

@docelic
Copy link
Contributor

docelic commented Jan 16, 2018

As part of handling a request, Amber may need to know the IP of the client.

If Amber server sits behind a proxy, then the client IP seen is the address of the proxy, and the proxy passes the real client IP in some HTTP header, such as X-Forwarded-For.
So, that other header must be consulted to get the real client IP for most purposes (access control, display of the IP address, logging, etc.).

Amber does currently have some support for this, and it does it in the following way:
(excerpts from src/amber/router/context.cr):

IP_ADDRESS_HEADERS = %w(REMOTE_ADDR CLIENT_IP X_FORWARDED_FOR
        X_FORWARDED X_CLUSTER_CLIENT_IP FORWARDED)

  def client_ip
    headers = request.headers
    val = {} of String => String
    IP_ADDRESS_HEADERS.find { |header|
      dashed_header = header.tr("_", "-")
      val = headers[header]? || headers[dashed_header]? || headers["HTTP_#{header}"]? || 
            headers["Http-#{dashed_header}"]?
    }
    val
  end

This approach has the following negative consequences:

  1. If the Amber server is not behind a proxy, then these various headers should not be looked up at all. (Looking them up (and taking them without discrimination) only allows users to spoof their IP address - this is even noted in the comments above the method)
  2. If the Amber server is behind a proxy, then the proxy must do extra work to delete all these possible headers from incoming requests (a total of 24 combinations per current code, in the worst-case scenario) and only set one real header in which it will store the actual client IP
  3. Even then, that header is by current Amber's code almost required to be named CLIENT_IP, because naming it any other way would cause more loops / checks to be ran on each request
  4. At least some local variables are created and some tr() replacements are run in each invocation of the client_ip() method
  5. It seems to me that client_ip should not be a method at all. Real client IP should be determined once and then stored somewhere in a variable or header as fixed information
  6. This implementation does not take into account that, in the future, support for PROXY protocol may be added to Amber or Crystal (https://www.haproxy.com/blog/haproxy/proxy-protocol/)

My suggestion is to improve the situation as follows:

  1. Allow Amber server configuration to specify the names of headers which will be checked for real client IP
  2. No transformations or variations of these names are to be generated during runtime (if a user wants 24 variations of header name, they simply need to list 24 variations in step Added community section #1)
  3. Default this value to empty list (no headers checked, no overhead in runtime, no conflict with PROXY protocol later)
  4. If some header names are specified, then perform efficient client IP detection once, and store the value in some fixed variable or header

Time permitting, I will work on implementing this. Comments welcome, thanks.

@docelic
Copy link
Contributor Author

docelic commented Jan 16, 2018

On a side note, it appears there is no remote_ip (the IP of the remote end of TCP connection) anywhere in HTTP::Request or HTTP::Context as per crystal-lang/crystal#453.

But the proposal there seems to be to add remote_ip to HTTP::Context. In that case, maybe client_ip should also be added somewhere there (in/by Amber, not Crystal).

So, remote_ip would be the remote end of the connection (added by Crystal in the future), while client_ip would be the real client IP (added by Amber).

And if there is no proxy (no headers configured or no PROXY protocol info sent), then client_ip would be set to equal value as remote_ip. That way Amber code could always just consult client_ip for real IP address of the client/visitor, and remote_ip for when the remote end of the current TCP connection is relevant.

@drujensen
Copy link
Member

@docelic Thanks for the detailed write up. I agree that we need to change the way this is implemented. What do you think about providing a proxy plug/middleware/handler to provide the client_ip and remote_ip? This is something someone could add/remove to the pipeline if they have a proxy in place. It can be configurable to the list of headers that should be checked.

Regarding the remote_ip, we were waiting for the crystal implementation because we felt it was better than building our own but that ticket has been in the queue for some time. I think our own implementation at this point makes sense.

@docelic
Copy link
Contributor Author

docelic commented Jan 16, 2018

Oh yes, a plug seems like a great idea for client_ip (and the ability/need to configure the plug will be extremely elegant when plugs get ability for in-place config params).
And for remote_ip, it seems to me we can't help that, that one must be done in Crystal, right? (Because already at the point when our handler is called, the IP is already gone/we can't see it any more)

@drujensen
Copy link
Member

Yeah, I think your right. I don't think we can get the socket connection remote ip. We can only parse a header provided by the proxy like the "X-Forwarded-For".

@docelic
Copy link
Contributor Author

docelic commented Jan 22, 2018

By the way, Elias added his comments regarding client_ip detection under #565 (which is a somewhat unrelated issue). I am copying the part referring to client_ip here:

eliasjpr commented:
"Also another reason why client_ip is not a handler. Since the client ip id dependent to external factors (proxy, web server, network setup, etc) I personally do not consider that is the framework responsibility to provide the client_ip since depending the kind of configuration the implementation can vary a lot and be too specific to a particular setup. For this reason it is a simple method on the context.
Now if someone would like to determine the client ip address for their specific setup they should do their own implementation.
For instance having a handler to pick the ip address from a header sometimes it is not sufficient since some setups (CDN) will set the client ip address in a permanent cookie. Because of this I personally have consider to remove the client_ip functionality altogether"

Here are my thoughts on this:

  1. Right, it is true that client_ip depends on external factors. But it seems it should be the framework's responsibility to provide a stable/uniform access to remote_ip and client_ip, or otherwise no standard functionality can be built on top of it. (Based on the discussion from Crystal's issue, seems like context.remote_ip and context.client_ip would be the right places.)

  2. The current approach with a method has problems as documented in [Session] Correctly names the session store file #543 and is running for every request, needed or not.

  3. The suggested new approach with a pipe doesn't prevent custom implementations. It makes them easier/more straightforward. If needed, people replace the ClientIp pipe with their own implementation to discover the value, and once they save it to context.client_ip, the rest continues using default functionality.

  4. For CDNs, I believe this is a different topic (not addressed by ClientIp). CDNs are requesting static content and client IPs on the app server side are not important. (A person does not care which IP was the first one to cause the resource to be cached on the CDN, because thousands of future accesses to the cached resource (from different IPs) on the CDN won't even reach the app server.)
    Someone using a CDN for static files generally shouldn't enable ClientIp in the static pipeline. (A commented ClientIp pipe was added to static pipeline template under the assumption that we are talking about end-user accesses through a load balancer, not CDN.)

@eliasjpr
Copy link
Contributor

The current approach with a method has problems as documented in #543 and is running for every request, needed or not.

  • The code to detecting the ip currently will not run unless the contex.client_ip method is called. Now in the instance of a handler (pipe in our case) it will always run.

@docelic
Copy link
Contributor Author

docelic commented Jan 22, 2018

Ah good point, you are certainly right about running only when client_ip() is called, my omission for thinking it runs always. (I assumed that this was used to determine the IP as a static value, not to go determine it every time client_ip is called).

Yes, for pipe it will run always if pipe is enabled. I believe this is OK because an IP address is an integral part involved in serving a request, logging, etc. (The fact that Crystal does not yet expose the remote end IP is amusing.)

@eliasjpr
Copy link
Contributor

For CDNs, I believe this is a different topic (not addressed by ClientIp). CDNs are requesting static content and client IPs on the app server side are not important.

  • We have production code where page are being served from CDN (Fastly) to serve cache content and our current configuration, is overriding the X-Forwarded header and setting other headers where the CDN thinks the real ip address is.

I honestly think that the framework can provide a basic implementation. but should not make too many assumptions.

@docelic
Copy link
Contributor Author

docelic commented Jan 22, 2018

(Right, but accesses from CDN are a completely different matter. CDN does not forward all requests to the app server as a proxy/LB does.)

@eliasjpr
Copy link
Contributor

I think we understand eachother @docelic.

@eliasjpr
Copy link
Contributor

@docelic
Copy link
Contributor Author

docelic commented Jan 23, 2018

Looks like this can be closed because other/related discussion is in the PR.

@docelic docelic closed this as completed Jan 23, 2018
eliasjpr pushed a commit that referenced this issue Jan 24, 2018
* Remove unconditional search for real client IP (#534)

* Pipe::ClientIp - saves client IP to Amber context

- Add Amber::Pipe::ClientIp
- It looks into configured header names (default: "X-Forwarded-For"),
  finds the first header that exists, and takes its last value
- Saves the value into context.client_ip as Socket::IPAddress(addr, 0)

- Covered by specs

- Along the way, sort in alphabetic order methods on the controller
  that are delegated to context for easier lookup and modification

* Add ClientIp pipe template to config/routes.cr

- Add Amber::Pipe::ClientIp to template config/routes.cr
- By default, add it in disabled state
- Even though ClientIp has default header ("X-Forward-For"), do mention
  name explicitly in the template, and also show an array to be as
  easy and intuitive as possible for users to understand what the header
  name is and that multiple headers are accepted

* Use first instead of last IP address (#564)

* Make non-matched header name more obvious (#564)

* Improvements as per discussion #564

- Fix/improve examples
- Extract HTTP::Server::Context extensions to extensions/http_extensions.cr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants