Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use node 7+ WHATWG parser for hostname, fixes #1002 #1004

Merged
merged 4 commits into from
Jun 20, 2017

Conversation

fl0w
Copy link
Contributor

@fl0w fl0w commented Jun 17, 2017

@jonathanong

really, the ideal solution is a separate module that supports it. if you know of any, let us know!

Well, there's the node 7+ WHATWG compliant url.URL, which could potentially help Koa "de-depend" parseurl and querystring as well (I think).

This PR could be expanded to substitute public API for host/hostname/query and maybe a few more.
There would be some minor adjustments to output like URL.port returning '' when :80 or :443 is provided (special scheme ports).
(performance impacted to heavily)

I'm also assuming url.URL would come with some minormajor performance hit.

const { URL } = require('url')
[
  "https://github.com",
  "http://www.github.com:80",
  "https://127.0.0.1:8080",
  "http://[::1]",
  "https://[::1]:801",
  "https://[2001:db8::1428:57ab]",
  "http://[2001:db8::1428:57ab]:805/index.js"
].forEach(url => {
  let parsed = new URL(url)
  console.log(`${url} - hostname: ${parsed.hostname}, port: ${parsed.port}`)
})
$ node -v
v8.1.0
$ node whatwg.js
https://github.com - hostname: github.com, port: 
http://www.github.com:80 - hostname: www.github.com, port: 
https://127.0.0.1:8080 - hostname: 127.0.0.1, port: 8080
http://[::1] - hostname: [::1], port: 
https://[::1]:801 - hostname: [::1], port: 801
https://[2001:db8::1428:57ab] - hostname: [2001:db8::1428:57ab], port: 
http://[2001:db8::1428:57ab]:805/index.js - hostname: [2001:db8::1428:57ab], port: 805

notice that all port 80 returns empty string, as http and https are special scheme, thus parsed implicitly

@codecov
Copy link

codecov bot commented Jun 17, 2017

Codecov Report

Merging #1004 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1004   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files           5       5           
  Lines         356     366   +10     
======================================
+ Hits          356     366   +10
Impacted Files Coverage Δ
lib/context.js 100% <ø> (ø) ⬆️
lib/request.js 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update beec26e...84da722. Read the comment docs.

@@ -5,6 +5,7 @@
* Module dependencies.
*/

const URL = require('url').URL;
const net = require('net');
const contentType = require('content-type');
const stringify = require('url').format;
Copy link
Contributor Author

@fl0w fl0w Jun 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about destructing here (const { format: stringify, URL } = require('url')),
though I'm not seeing it elsewhere so assumed it was taboo in koajs/koa.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fl0w i think we didn't add it for node v4 support previously

Copy link

@hiroppy hiroppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change node's engines in package.json to 7.0.0.

@fl0w
Copy link
Contributor Author

fl0w commented Jun 19, 2017

@abouthiroppy that will mess with users who transpile Koa down to e.g. node 6.

lib/request.js Outdated
@@ -242,9 +243,12 @@ module.exports = {
*/

get hostname() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some adjusting - the code path is weird and for some reason I'm not using host whilst also assigning const host = this.host (relic from the past). I'll clean this up later today.

lib/request.js Outdated
if (!this.parsedUrl) {
const host = this.host;
if (!host) return '';
this.parsedUrl = new URL(`${this.protocol}://${this.host}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the performance? Slow down or faster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use WHATWG parser for hostname when host start with [, otherwise keep the old way to get high performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

@fl0w fl0w Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind sharing your bench src?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a potential module to me :)

Copy link
Contributor Author

@fl0w fl0w Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathanong the hostname validation or bench src? :)

Copy link
Member

@fengmk2 fengmk2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider performance before merge.

@fl0w
Copy link
Contributor Author

fl0w commented Jun 19, 2017

@fengmk2 @jonathanong
I exposed parsed URL and made it complete with originalUrl (maybe used for other stuff now or in the future, like Punycode? Where request.hostname will return empty) Edit: actually this was presumptuous of me - I'm not 100% sure about current behaviour.

The naming might be conflicting with url?
Kept memoization.

I'll add docs and tests for get URL() if this is an acceptable solution (and naming).

lib/request.js Outdated
const protocol = this.protocol;
const host = this.host;
const originalUrl = this.originalUrl || ''; // avoid undefined in template string
this.memoizedURL = new URL(`${protocol}://${host}${originalUrl}`);
Copy link
Contributor Author

@fl0w fl0w Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL will throw TypeError if something's off.
Koa generally has value || '' type of solutions in most cases.

Maybe wrap this in a try-catch and memoize empty object so that this.URL.hostname when malformed does not throw/crash the entire app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A try-catch shouldn't deopt since node 7.x (v8/v8@9aac80f)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap this in a try-catch and memoize empty object so that this.URL.hostname when malformed does not throw/crash the entire app?

+1

lib/request.js Outdated
@@ -244,10 +245,29 @@ module.exports = {
get hostname() {
const host = this.host;
if (!host) return '';
if ('[' == host[0]) return this.URL.hostname; // IPv6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.URL can be an empty object, return this.URL.hostname || '' here

@jonathanong jonathanong merged commit 327b65c into koajs:master Jun 20, 2017
@urugator
Copy link
Contributor

I am not sure if this is a fortunate solution.
I think it brings an inconsitency into how request object state relates to the underlying req object depending on the hostname format.
Until now the request was always up to date with the req object. So any changes of req were accordingly projected to the request.
Currently due to a memoization this is no longer true - if I do request.headers['host'] = 'rewrittenHost' the output of request.hostname will depend on the hostname format and whether it has been already accessed or not. Even wierdly the request.host can mismatch the request.hostname.
I don't know to which extend the modifications of req inside middlewares are supported, but I think the behavior should not depend on hostname format as it can be a source of confusion and unpredictability.
Also note that returned URL object can be modified outside of request object, possibly changing the outcome of request.hostname again based on the hostname format...
Maybe you could make everything (protocol, hostname, url...) dependent on that URL object by default, keeping it as a source of truth instead of the wrapped req. It woudn't be ideal, but at least quite consistent.

@fl0w
Copy link
Contributor Author

fl0w commented Jun 22, 2017

Yes, this is a side effect I've been pondering about. As a quick solution, you can simply clear memoization by setting request.memoizedURL = undefined, this will force a new parse.

This was done because the parse felt quite costly (as noted by fengmk2).

edit

Maybe you could make everything (protocol, hostname, url...) dependent on that URL object by default, keeping it as a source of truth instead of the wrapped req. It woudn't be ideal, but at least quite consistent.

This was my initial proposition, but currently it seems to be hitting performance with significance. This was in comparison to request.host, not sure about the actual implications on a real world application.

@fl0w
Copy link
Contributor Author

fl0w commented Jun 22, 2017

@urugator though correct me if I'm wrong - this should only be of concern when hostname is called with an IPv6 host?

As far as inconsistencies, memoization could be removed but it would still require one to fetch a new URL object every time request.req.[headers|host|*] is changed.

@urugator
Copy link
Contributor

urugator commented Jun 22, 2017

@fl0w The concern is that code behaves differently based on the hostname format.
In one request hostname depends on req.headers[host] and in another on memoizedUrl (where both are exposed to user as different things).
If I have a middleware for changing headers like req.headers[host] = newHostname, it will work for some requests, but not for anothers.
Similary middleware which modifies URL object like request.URL.hostname = newHostname; sometimes also modifies request.hostname and sometimes not.
The same goes for modifying request.URL.host, which sometimes also modifies request.hostname, but never request.host.
The behavior also depends on to which format the headers["host"] was possibly updated, whether it happened after or before URL was acessed (memoized), etc.
I think it should be clear how things depend on one another and how changing one thing affects another and this behavior should stay always the same and shouldn't depend on host format.

Actually I would say that simply stating that Ipv6 hostname is not supported is a more acceptable solution, because seems easier to reason about.

However I still wonder whether the WHATWG parser is really neccessary for this.

require one to fetch a new URL

It has to be kept in sync manually, but not re-constructed:

const newHost = "myLittleHost"
request.headers["host"] = newHost;
request.URL.host = newHost;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants