Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes email validation regular expressions #7005

Closed
wants to merge 20 commits into from

Conversation

smz
Copy link
Contributor

@smz smz commented May 21, 2015

This fixes regular expressions used for email validation (see: #6982)


Joomla doesn't allow ' char in user email, JFormRuleEmail pattern has ` but not '

It is allowed to have an apostrophe ' in the e-mail address by the RF5322 standard as long as it is an ASCII character.

Irish email as example: o'sullivan@example.com

This fixes regular expressions used for email validation (see: joomla#6982)
@smz
Copy link
Contributor Author

smz commented May 21, 2015

@zero-24 (I suppose...) Thanks for fixing my initial comment!

@smz
Copy link
Contributor Author

smz commented May 21, 2015

oops... I can't understand what Travis is complaining for...
NOW I know... 😔

@smz
Copy link
Contributor Author

smz commented May 22, 2015

nope... it doesn't work: problem in validate.js. Working on it, please stand-by... sorry!

@smz
Copy link
Contributor Author

smz commented May 22, 2015

Ready for testing. Sorry for the initial (hopfully...) f##k-up.

@ghost
Copy link

ghost commented May 22, 2015

I tried to add a new user in backend and frontend. Tested emails

hello're@example.org
o'sullivan@example.com

Result is

Error
Save failed with the following error: Please enter a valid email address

Warning
Registration failed: Please enter a valid email address.

I think it's the PHP validation because page reloads after form submit and then message occurs.

@smz
Copy link
Contributor Author

smz commented May 22, 2015

@bertmert
after applying the patch, please refresh the JS by shift+reload... (or clear the browser cache)...

@smz
Copy link
Contributor Author

smz commented May 22, 2015

btw... where are you testing? I tested in com_contacts...

@smz
Copy link
Contributor Author

smz commented May 22, 2015

silly me: you already stated where you are testing! let me try...

@smz
Copy link
Contributor Author

smz commented May 22, 2015

You're right: it doesn' work with com_users (it works with com_contacts, anyway...) I have to find out the reason, but now I'm leaving for some hours. Please stand-by...

@smz
Copy link
Contributor Author

smz commented May 22, 2015

even in com_contact com_users JS validation is OK: o'sullivan is accepted (black field), o"sullivan is rejected (red field), but then it doesn't save... must be something within the component itself... will check later... sorry.

@ghost
Copy link

ghost commented May 22, 2015

I'll send you a PR to branch .smz:EmailValidation.

There's another validation in libraries/joomla/mail/helper.php::isEmailAddress line 130

$allowed = 'a-zA-Z0-9.!#$%&’*+\/=?^_`{|}~-';

called by libraries/joomla/table/user.php::check line 198

Changed line to

$allowed = 'a-zA-Z0-9.!#$%&\'*+\/=?^_`{|}~-';

and all works now.

@Denitz
Copy link
Contributor

Denitz commented May 22, 2015

I see BC issue:
Initial 'grave accent' char ` (0x60) was replaced with 'single quote' ' (0x27), so all current emails with 0x60 will be invalidated and users won't be able to save profiles.

We should add new allowed char (single quote) to patterns but not replace any existing chars.

Allow apostrophe in com_user emails
@smz
Copy link
Contributor Author

smz commented May 22, 2015

@bertmert thanks for your PR for fixing validation in com_users! Merged.
Actually I think JMailHelper::isEmailAddress() needs more fixes as it allows invalid characters in the domain part of email addresses. I'll look into that and see if it is worth fixing...

(oops... while I'm writing this, I see Travis failed after merging your PR... will look into that...)

@Denitz IMHO this is not breaking B/C: this is fixing a bug introduced by someone who used the wrong apostrophe (my bet is that he/she was using a Mac while editing...). Try register an email address containing the grave accent char with Gmail, outlook.com, Yahoo or any other major email provider and see if you can...

@smz
Copy link
Contributor Author

smz commented May 22, 2015

Travis is failing because of this:

There were 2 failures:
1) JMailHelperTest::testIsEmailAddress with data set #19 ('o\'reilly@there.com', false)
Failed asserting that true matches expected false.
/home/travis/build/joomla/joomla-cms/tests/unit/suites/libraries/joomla/mail/JMailHelperTest.php:301
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:104

2) JMailHelperTest::testIsEmailAddress with data set #20 ('o’reilly@there.com', true)
Failed asserting that false matches expected true.
/home/travis/build/joomla/joomla-cms/tests/unit/suites/libraries/joomla/mail/JMailHelperTest.php:301
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///home/travis/.phpenv/versions/5.3.29/bin/phpunit/phpunit/TextUI/Command.php:104

IMHO it is a mistake in test unit (same mistake we had in code).
I'm calling PLT to see what we should do...

@smz
Copy link
Contributor Author

smz commented May 22, 2015

Some further considerations (food for thought...):

  • in the phpmailer library there is a similar validation regex. Although a bit more sophisticated (in the domain checking part) it is compatible with the one introduced with this PR. The previous regex we used wasn't.
  • I think actually both our regex and the phpmailer one are wrong: quoted strings are allowed in email address (e.g.: "John Doe"@example.com)
  • some think email addresses should not be checked for RFC compliance: see http://girders.org/blog/2013/01/31/dont-rfc-validate-email-addresses/

@smz
Copy link
Contributor Author

smz commented May 22, 2015

More information:

The HTML5 standard defines the rule for validating the <input type=email> element with the following PCRE regexp:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

According to the standard itself, the above is not a strict implementation of the relevant RFC5322:

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

I think the more sensible solution for us is to follow the HTML5 standard and implemant the same validation rules. This is, I guess, particullay true for media/system/js/html5fallback.js.

A further consideration is that we should be consistent with that (or any other) validation rule: if we use a rule in our JS code, we cannot use a different rule in our PHP code.

Due to the above, I will soon submit to this PR a commit in line with the above considerations (it will fail with Travis for the same reasons it is failing right now).

Also: in libraries/joomla/form/rule/email.php in theory there is provision for validating only the local-part of an email address (i.e: without the domain part), but this wasn't actually implemented. I'm unsure what to do with that... In the meanwhile I will do what was previously done, that is always check for a full email address.

If you think there is anything wrong with the above, please let me know...

@dgrammatiko
Copy link
Contributor

if we use a rule in our JS code, we cannot use a different rule in our PHP code

I don’t think I agree with that. JS validation is to check (coarse) that an email address is provided and PHP should make sure that this address actually exists etc

@smz
Copy link
Contributor Author

smz commented May 23, 2015

Dimitris, our PHP code has never checked that an address actually exist (not even the domain): it has always done a formal check of the email address, the same as JS does.

@dgrammatiko
Copy link
Contributor

Sergio, I just stated what my personal preference is here. It’s not something that should prevent you from doing you think is best.
But here is some infos that support this:
For js regex whatever you will come up to I don’t think its gonna be bulletproof https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php
For php this is easier: http://isemail.info
Ooops I meant to write validate not exist

@dgrammatiko
Copy link
Contributor

@test works for o’sulivan@yahoo.gr

@smz
Copy link
Contributor Author

smz commented May 24, 2015

@dgt41 Dimitris, does the .gr registry allows domains names in Greek? In case, do you have (or can you create) a mailbox in such a domain (possibly two: one with an ASCII local part and one with a Greek local part)?

Thanks!

@dgrammatiko
Copy link
Contributor

@smz Haven’t tried, TBH, give me 5mins to check!

@smz
Copy link
Contributor Author

smz commented May 24, 2015

No prob and no hurry! Take your time...

@dgrammatiko
Copy link
Contributor

@smz Damn all my sites use google, zoho and msn for the mail part. I am gonna need some time for this

@smz
Copy link
Contributor Author

smz commented May 24, 2015

No prob... in case you can, please let me know (direct email, so we don't litter this conversation too much...)

smz added 2 commits May 25, 2015 00:08
There were a lot of [space][tab] sequences in it and I converted all of
them to [tab]
@infograf768
Copy link
Member

That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)

Correct.

Concerning our punycode.js
#7005 (comment)

I suggest again to update it and test.

@infograf768
Copy link
Member

@dgt41 Dimitris, does the .gr registry allows domains names in Greek? In case, do you have (or can you create) a mailbox in such a domain (possibly two: one with an ASCII local part and one with a Greek local part)?

We do have a Russian test domain (with a joomla site btw)
My experience is that utf8 local parts as are no good.

@smz
Copy link
Contributor Author

smz commented May 25, 2015

That thing I removed from the regexp (and I think should be removed from test units) is not an ASCII grave accent, but an Unicode "right single quotation mark", which is not ASCII, but U+2019 (hex sequence: E2 80 99)

Correct.

So... how can we have the test unit corrected?

Concerning our punycode.js
#7005 (comment)

I suggest again to update it and test.

I see: I downloaded the latest rev (1.3.2) and now they're doing exactly the same thing I did in validate.js, so there is probably no need for us doing it there. I will test and report.

@smz
Copy link
Contributor Author

smz commented May 25, 2015

@dgt41, @infograf768
I've registered the Greek domain πάνταρει.gr ("panta rei: everything flows"... isn'it it nice? Heraclitus would be happy!) and will contribute it to the community for whatever test will be needed. Punycode domain name is: xn--hxakmqrqkw.gr

As soon as possible I will set up a (Joomla, of course!) site for it and email service (which, for the time being, will not allow UTF8 local parts). I'll keep you informed.

@wilsonge
Copy link
Contributor

I dunno how much it helps but there was a long discussion about dealing with validating emails a long time ago joomla/joomla-platform#1833

@smz
Copy link
Contributor Author

smz commented May 25, 2015

@wilsonge Thanks, interesting reading! And I like what the framework came to:

return (boolean) filter_var($email, FILTER_VALIDATE_EMAIL);

KISS code, which is normally the best solution...

I'll check if it will pass our test corpus...

I anyway see a (recurring) mistake in that discussion too: IP addresses, when used in an email address domain-part, must be enclosed within square brackets. Once again:

local-part@[192.168.0.1] <-- OK This will be taken as an IP address
local-part@192.168.0.1 <-- KO! (it is subdomain "192.168" of the domain ".0" in (non existent) TLD ".1")

(Yes, you can have numeric domain names (at least in the .it TLD): I used to have some of those for the telephone company I was working for...)

Do we want this to pass? Putting an IP address in the domain-part is something normally done by spammers only...

Edit: another possibility to use an IP addresses in email address might be rDNS notation:

local-part@1.0.168.192.in-addr.arpa

... but I'm unsure if you can have MX records in rDNS zones...

@smz
Copy link
Contributor Author

smz commented May 25, 2015

@infograf768
You were right: punycode.js v1.3.2 fixes the handling of email addresses: no need to modify validate.js (I will anyway fix all those extra spaces in validate.js...)

Please see my EmailValidation_PunycodeJS_132 branch.
In it validate.js has a console.log() to display the "punycoded" value (it will be removed before committing to joomla-cms, in case)

In that branch I also modified /libraries/cms/html/behavior.php so that it loads punycode.min.js, i.e. the original name of the minified version of punycode.js: I think it is better to keep original file names for external libraries whenever it is possible (think of composer...). Is that OK?

Edit: Commited here, now. Deleted my temp branch.

@dgrammatiko
Copy link
Contributor

@smz just update punycode-uncompressed.js to be inline with the other scripts (my take). I guess we can’t take this file out as it will b B/C(?)

@smz
Copy link
Contributor Author

smz commented May 26, 2015

@dgt41, well... we never loaded any *-uncompressed.js file anywhere...

I think the worst (an unlikely) can happen is that some 3rd party extensions directly loads punycode.js expecting the minified version and it gets the expanded version instead...

At least another JS library in /media/system/js have now the new (and IMHO more logical) naming convention: jquery.Jcrop.min.js

@dgrammatiko
Copy link
Contributor

-uncompressed.js should be loading when in debug mode. But removing it or not is more a PLT decision, so don’t look at me

@smz
Copy link
Contributor Author

smz commented May 26, 2015

ah, really? -uncompressed.js are automatically loaded while in debug mode? I didn't know that, my bad! In this case I will revert my decision and leave filenames unchanged...

(we will sooner or later have to do something as we go toward a more automatic method of updating external libraries, composer or whatever...)

@smz
Copy link
Contributor Author

smz commented May 26, 2015

... fixed in my branch, but (as there is no reason to wait) I will now commit it here too..

This fixes the handling of IDNA domains in JS validation without need to
modify validate.js
@smz
Copy link
Contributor Author

smz commented May 26, 2015

Fixed. Thanks @dgt41!

@smz
Copy link
Contributor Author

smz commented May 26, 2015

Oh, cap! merge conflicts... why???

@dgrammatiko
Copy link
Contributor

validate.js changed recently, do a rebase

@smz
Copy link
Contributor Author

smz commented May 26, 2015

... and html5fallback.js as well... 😒

@smz
Copy link
Contributor Author

smz commented May 26, 2015

Actually only html5fallback.js was changed.

I tried a "git rebase --interactive", but I'm now in deep mud: I have a "DETACHED HEAD", I can't switch branch and I'm stuck without any possibility of doing anything.

My editor opened with the following:

pick e4140cb Fixes email validation regular expressions
pick 479ec12 Fixed regexp as PHP string
pick 3a48af0 Forgot to fix it in another place
pick 63b959c This should work!
pick c58bcde ... but this is the good one!
pick 211a568 Allow apostrophe in com_user emails
pick 226e118 New HTML5 compliant regexp
pick e9df3cb Ouch! What a noob's mistake!!
pick b0fbc5d YANM!
pick 4059299 JMailHelper::isEmailAddress()
pick 7b6cf7c Fix for unit test
pick e323b02 I missed a couple of CS glitches...
pick 4e8b731 Address length in JMailHelper::isEmailAddress()
pick 9ee840a It is useless to check domain-part length
pick f31893d Fix IDNA domains vaildation
pick ca861b1 Made validate.js safer
pick 08f527d A minor modification to validate.js
pick 105eb87 CS in validate.js
pick 5ebb8bc Updated punycode.js to rev. 1.3.2

# Rebase 110c7b1..5ebb8bc onto 110c7b1
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

... but I don't have the slightest idea of what I should do with that... 😖

@smz
Copy link
Contributor Author

smz commented May 26, 2015

I managed to abort the rebase, but I now need guidance on how to do it correctly... sorry, guys!

@smz smz closed this May 26, 2015
@smz smz deleted the EmailValidation branch May 26, 2015 15:33
@smz
Copy link
Contributor Author

smz commented May 26, 2015

I closed this as I was unable to resolve conflicts. New PR is #7041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants