Skip to content
This repository has been archived by the owner on Dec 11, 2019. It is now read-only.

Hardened isNotURL / isURL code in urlutil.js #6086

Merged
merged 4 commits into from
Dec 8, 2016
Merged

Hardened isNotURL / isURL code in urlutil.js #6086

merged 4 commits into from
Dec 8, 2016

Conversation

bsclifton
Copy link
Member

  • Submitted a ticket for my issue if one did not already exist.
  • Used Github auto-closing keywords in the commit message.
  • Added/updated tests for this change (for new code or code which already has tests).
  • Ran git rebase -i to squash commits (if needed).

Fixes #5911
(which was unintentionally introduced with #4546)

Also fixes these cases (no issues created yet):

  • view-source/mail-to/etc were not checking from beginning of string
  • ensures input is string (preventing failing call to trim())

And adds missing tests for:

  • checking localhost toLowerCase()
  • checking basic auth formatted URI (with and without protocol)
  • is input a string
  • each type of special page
  • partial bad matches on special pages

Also updates tests for isURL to ensure value is always negated version of isNotURL

Auditors

@bbondy, @darkdh, @diracdeltas

Since this code has a very large potential impact (everything from the URL bar calls this for example), please review and help make sure I covered everything safely :)

Unit test

npm run unittest -- --grep="urlutil"

Webdriver test

# window 1
npm run watch-test
#window 2
npm run uitest -- --grep="when following URLs"

Fixes #5911
(which was unintentionally introduced with #4546)

Also fixes these cases (no issues created yet):
- view-source/mail-to/etc were not checking from beginning of string
- ensures input is string (preventing failing call to trim())

And adds missing tests for:
- checking localhost toLowerCase()
- checking basic auth formatted URI (with and without protocol)
- is input a string
- each type of special page
- partial bad matches on special pages

Also updates tests for isURL to ensure value is always negated version of isNotURL

Auditors: @bbondy, @darkdh, @diracdeltas

Since this code has a very large potential impact (everything from the URL bar calls this for example), please review and help make sure I covered everything safely :)

Unit test:

`npm run unittest -- --grep="urlutil"

Webdriver test:

`npm run uitest -- --grep="when following URLs"`
// for cases:
// - starts with "?" or "."
// - contains "? "
// - ends with "." (and was not preceded by a /)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type 123/123. will be search string on other browsers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in cd71e0e and test is also covered. Mainly changing logic of end with period, which is
brave.com/123. will be url and brave/com/123. will not

assert.equal(UrlUtil.isNotURL('brave.com/test/cc?_ri_=3vv-8-e.'), false)
})
it('ends with period (input contains only a forward slash)', function () {
assert.equal(UrlUtil.isNotURL('brave/com/test/cc?_ri_=3vv-8-e.'), true)
Copy link
Member Author

@bsclifton bsclifton Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for an extra . might not be a good idea, because folks may use Brave to browse their intranet. It's not a best practice, but many folks will setup a host file entry and access hosts by computer name (instead of domain)

ex: http://computer001/phpMyAdmin/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually- I think this behavior might be OK! I am almost positive you are correct about the other behavior (because it's happened to me so many times 😛 ) When prefixed with protocol, it validates just fine: cd88861

@diracdeltas
Copy link
Member

i'm not sure why we are using regexes and window.URL in urlutil at all. seems like you should just be able to do urlParse and conclude the URL is invalid if protocol and host are null.

@bbondy
Copy link
Member

bbondy commented Dec 8, 2016

@diracdeltas

i'm not sure why we are using regexes and window.URL in urlutil at all. seems like you should just be able to do urlParse and conclude the URL is invalid if protocol and host are null.

I didn't write this original code but really it's more about heuristics about if a user meant to type a url or not. For example require('url').parse('a b') will give output of no protocol but will give pathname, path and href. The user means a search in that case.

For example require('url').parse('www.test.com') will give the same output types. And in that case it's meant to be a URL. So this function is more about what the user intended for it to be, and not whether it can be considered a URL or not.

@bbondy
Copy link
Member

bbondy commented Dec 8, 2016

The second example has host not filled in by the way

Copy link
Member

@bbondy bbondy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better tested than what we had before and works better so even if we refactored it later to use urlParse in some other way we can still at that point keep the tests and change the implementation. So merging.

// for cases, pure string
const case3Reg = /[\?\.\/\s:]/
// for cases, data:uri, view-source:uri and about
const case4Reg = /^data:|view-source:|mailto:|about:|chrome-extension:|magnet:.*/
const case4Reg = /^(data|view-source|mailto|about|chrome-extension|magnet):.*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:|

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch

@bbondy bbondy merged commit 163f1e3 into brave:master Dec 8, 2016
@diracdeltas
Copy link
Member

@bbondy i figured, but i was hoping we could just do urlParse after normalizing the input; i will open a separate issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clicking link from email will sometimes will do a search instead of directly following
5 participants