Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: refactor truncating long hostname #9292

Closed
wants to merge 4 commits into from

Conversation

jun-oka
Copy link
Contributor

@jun-oka jun-oka commented Oct 26, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

lib url
test url

Description of change

I deleted and refactored the line around 417 in url.js which was truncating hostname if hostname length after . is more than 63. Thus url parse behavior for hostname should be same as browser's behavior. Now it doesn’t truncate hostname.

@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Oct 26, 2016
Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I suppose.

@mscdex mscdex added the url Issues and PRs related to the legacy built-in url module. label Oct 26, 2016
Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The moving-part-of-the-hostname-to-the-path bit is really surprising and, I would argue, buggy behavior. Is that documented anywhere? If not, I wonder if the correct thing to do is a (hopefully semver-patch) change that treats such URLs as invalid. (A hostname longer than 63 chars is invalid per RFC 1035.)

EDIT: It doesn't appear to be documented anywhere.

@Trott
Copy link
Member

Trott commented Oct 26, 2016

/cc @nodejs/http-parser I guess, although I suppose this isn't http-specific...

@jun-oka
Copy link
Contributor Author

jun-oka commented Oct 26, 2016

@Trott OK. I agree. I'll look into the document again.
The only way I could pass this URL on test was this way, so I thought it was true.
However, according to your opinion, it seems buggy.

@Trott
Copy link
Member

Trott commented Oct 26, 2016

Looks like this behavior was introduced (or at least refactored?) in db912813 by @mscdex. I'm not sure if it was unintentional and the intended behavior was something else, or if this is totally intended behavior on the theory that "garbage input results in garbage output". So maybe @mscdex can let us know?

@jasnell
Copy link
Member

jasnell commented Oct 28, 2016

ping @mscdex

@mscdex
Copy link
Contributor

mscdex commented Oct 28, 2016

I honestly don't remember, that was awhile ago. All I could go by is whatever tests we had at the time.

@Trott
Copy link
Member

Trott commented Oct 28, 2016

Looking at what new URL(url) does in the browser, it does not truncate the hostname component. I would say we should follow suit: Don't try to validate the hostname length.

I would say the path forward is:

  • rewrite the expected test result to preserve the hostname
  • update url.js to not truncate long hostnames
  • update the subsystem in the PR name and the commit message to be url
  • Get some input from @nodejs/lts on the semver implications of this. I think it's a bug fix, personally, but if anyone wants to argue that it should be treated as semver-major out of an abundance of caution, I'm OK with that.

What do others think?

@Trott
Copy link
Member

Trott commented Oct 28, 2016

I should add that, to me at least, the real objectionable thing here is not the truncation but the moving of the remainder of the hostname into the path. That just makes no sense to me. I cannot think of a situation where that is helpful. And I can think of situations where that can introduce bugs, for sure.

@jun-oka
Copy link
Contributor Author

jun-oka commented Nov 4, 2016

@Trott Thanks.
I will do this part.

  • rewrite the expected test result to preserve the hostname
  • update url.js to not truncate long hostnames
  • update the subsystem in the PR name and the commit message to be url

@sam-github
Copy link
Contributor

re: #9521 (comment), I think truncation is bizarre.

I don't believe that the URL RFCs say that host names have to be _DNS_ host names, there are lots of ways of resolving names, so I think truncation is a bug.

Here's an example:

% tail /etc/hosts

127.0.0.1 ThisIsAnAbsurdlyLongHostNameButSurelyNoneOfThisShouldEndUpAsPartOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com
% ping -c1 ThisIsAnAbsurdlyLongHostNameButSurelyNoneOfThisShouldEndUpAsPartOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com
PING ThisIsAnAbsurdlyLongHostNameButSurelyNoneOfThisShouldEndUpAsPartOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.065 ms

host names clearly are not the same as DNS names, they can be longer than DNS allows and still be resolveable.

@jun-oka jun-oka changed the title test: expand test coverage for url.js url: stop to truncate long hostname into path Dec 1, 2016
@jun-oka jun-oka changed the title url: stop to truncate long hostname into path url: stop to truncate long hostname Dec 1, 2016
@jun-oka jun-oka changed the title url: stop to truncate long hostname url: stop truncating long hostname Dec 1, 2016
@jun-oka jun-oka force-pushed the test-url-coverage branch 4 times, most recently from de9563d to c186062 Compare December 1, 2016 04:24
@jun-oka jun-oka changed the title url: stop truncating long hostname url: refactor truncating long hostname Dec 1, 2016
Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if new CI is ✅

@Trott
Copy link
Member

Trott commented Dec 1, 2016

@cjihrig @jasnell This is a bit different (and, in my opinion, better) than what was originally submitted. Can you re-check the code to make sure it is still good by you?

@Trott
Copy link
Member

Trott commented Dec 1, 2016

@Trott
Copy link
Member

Trott commented Dec 1, 2016

/cc @sam-github

@jun-oka
Copy link
Contributor Author

jun-oka commented Dec 7, 2016

@cjihrig @jasnell @sam-github It would be nice if you can check it!

@sam-github
Copy link
Contributor

I'm not familiar with the code, it would make more sense for @jasnell (who just wrote a URL decoder) or @mscdex (who might have implemented the truncation) looked at it.

@mscdex mscdex added the semver-major PRs that contain breaking changes and should be released in the next major version. label Dec 7, 2016
const code = hostname.charCodeAt(i);
const isVc = code === 46/*.*/ ||
code >= 48/*0*/ && code <= 57/*9*/ ||
code >= 97/*a*/ && code <= 122/*\z*/ ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a backslash before z here that needs to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thank you for finding.

for (var i = 0; i <= hostname.length; ++i) {
const code = hostname.charCodeAt(i);
const isVc = code === 46/*.*/ ||
code >= 48/*0*/ && code <= 57/*9*/ ||
Copy link
Contributor

@mscdex mscdex Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the parens around the groups of checks, IMHO it makes it easier to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think so too.

@mscdex
Copy link
Contributor

mscdex commented Dec 8, 2016

@Trott I'm just being more cautious is all, especially since this behavior has existed for a long time (even before my rewrite). If everyone else feels differently, then go ahead and remove the semver-major label.

@jun-oka
Copy link
Contributor Author

jun-oka commented Dec 9, 2016

@mscdex
Thank you for those reviewing.
I will look into them.

@jun-oka
Copy link
Contributor Author

jun-oka commented Dec 23, 2016

@mscdex @Trott
I have just updated!

@Trott
Copy link
Member

Trott commented Dec 24, 2016

@mscdex
Copy link
Contributor

mscdex commented Dec 24, 2016

There's a couple of changes that still need to be made, namely the for loop iteration condition and the isVc variable naming.

@jun-oka jun-oka closed this Dec 24, 2016
@jun-oka jun-oka reopened this Dec 24, 2016
(code >= 65/*A*/ && code <= 90/*Z*/) ||
code === 43/*+*/ ||
code === 95/*_*/||
code > 127;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're changing things in here, I think it might be better to re-prioritize the conditionals here to optimize for the common cases. Here is what I propose:

const isVc = (code >= 97/*a*/ && code <= 122/*z*/) ||
             code === 46/*.*/ ||
             (code >= 65/*A*/ && code <= 90/*Z*/) ||
             (code >= 48/*0*/ && code <= 57/*9*/) ||
             code === 45/*-*/ ||
             code === 43/*+*/ ||
             code === 95/*_*/||
             code > 127;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex In my opinion, that makes more sense. It looks more common in this case. Thanks a lot.

Currently, around line 417 lib/url.js is truncating hostname and put the rest of hostname to the path if hostname length after `.`  is equal or more than 63. This behavior is different from browser behavior. I changed the code so that it doesn’t truncate.
I also added the test example which has more than 63 length in after `.`  in hostname in test url.
@jun-oka
Copy link
Contributor Author

jun-oka commented Dec 24, 2016

@mscdex I have applied your review which you proposed.

(code >= 48/*0*/ && code <= 57/*9*/) ||
code === 45/*-*/ ||
code === 43/*+*/ ||
code === 95/*_*/||
Copy link
Contributor

@mscdex mscdex Dec 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: there is a missing space before the ||. I missed that in my proposed example earlier.

@jun-oka
Copy link
Contributor Author

jun-oka commented Dec 24, 2016

@mscdex Thank you! I corrected it. I will be careful.

@Trott
Copy link
Member

Trott commented Dec 24, 2016

@mscdex
Copy link
Contributor

mscdex commented Dec 25, 2016

LGTM

@Trott
Copy link
Member

Trott commented Dec 25, 2016

(arm-fanned failure on CI is infra related and unrelated to this change.)

Trott pushed a commit to Trott/io.js that referenced this pull request Dec 25, 2016
Currently, around line 417 lib/url.js is truncating hostname and put the
rest of hostname to the path if hostname length after `.`  is equal or
more than 63. This behavior is different from browser behavior. I
changed the code so that it doesn’t truncate.

I also added the test example which has more than 63 length in after
`.` in hostname in test url.

PR-URL: nodejs#9292
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
@Trott
Copy link
Member

Trott commented Dec 25, 2016

Landed in c65d55f.

Two months in the making. 🎉

@Trott Trott closed this Dec 25, 2016
italoacasas pushed a commit to italoacasas/node that referenced this pull request Jan 18, 2017
Currently, around line 417 lib/url.js is truncating hostname and put the
rest of hostname to the path if hostname length after `.`  is equal or
more than 63. This behavior is different from browser behavior. I
changed the code so that it doesn’t truncate.

I also added the test example which has more than 63 length in after
`.` in hostname in test url.

PR-URL: nodejs#9292
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
@joyeecheung joyeecheung mentioned this pull request Feb 2, 2017
3 tasks
@jasnell jasnell mentioned this pull request Apr 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-major PRs that contain breaking changes and should be released in the next major version. test Issues and PRs related to the tests. url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants