-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix remaining travis warnings #81
Conversation
Raw strings (marked by the `r` prefix) are normally used for regular expression patterns to allow that module to expand any backslash-escape sequences without interference from the Python interpreter's string expansion, and thus the requirement for extra quoting. This practice is further encouraged by python 3.5 and later raising a `DepreciationWarning` when encountering escape sequences it can't expand, as is common is re patterns. However, in python 2.7, the `re` module doesn't expand \u and \U escapes for unicode codepoints, so patterns with non-ascii characters can only be used in raw strings when using python 3.3 or later, or the external `regex` module, and neither `u` nor `r` quoted strings will work for patterns in both languages. However, Python will concatenate whitespace-separated string literals even if they use different quoting. Take advantage of this to split the lexer patterns into raw- and unicode-quoted segments as appropriate. Also remove some unnecessary escape sequences.
The pep8 name is deprecated, and the tool continues under the new name.
Now that we're using a newer pytest, we can invoke it as such even in the jython build, instead of using the older `py.test`.
The parser's token pattern strings escaped almost all punctuation characters, but the Python `re` module has a fairly limited set of special characters which must be backslash-escaped. In particular, characters like @ ! : and space do not need escapes, and even more are fine inside a [] environment describing a set of characters. Audit the expression strings and remove most unnecessary escapes. I left some sequences like `[^\^\[]` which aren't ambiguous without escapes for clarity of reading.
The default osx image on travis doesn't have an up-to-date set of homebrew packages, so the `brew upgrade` step must compile many packages from source and takes tens of minutes to complete. This makes for very slow test feedback and occasionally failures when the job times out. Travis itself doesn't provide python3 packages through the `language` and `python` yaml keys like it does on Linux. Setting these keys fails trying to download the requested tarball. Give up therefore, and just use whatever python3 was installed on the image by default. Currently that is 3.6.5 from homebrew. This significantly speeds up test feedback.
Codecov Report
@@ Coverage Diff @@
## master #81 +/- ##
=======================================
Coverage 87.24% 87.24%
=======================================
Files 27 27
Lines 996 996
=======================================
Hits 869 869
Misses 127 127
Continue to review full report at Codecov.
|
@ageorgou I was just wondering if you've had time to look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rillian!
This looks fine, with one exception (the missing ?
). I'm not confident enough with the lexer to be sure that the intention is preserved, and that the tests are covering all cases.
If I'm not misreading the missing ?
, could you possibly add a test to cover that case? (which I'm assuming is not covered)
pyoracc/atf/common/atflex.py
Outdated
# --- Rules for paragaph state---------------------------------- | ||
# Free text, ended by double new line | ||
|
||
terminates_para = \ | ||
"(\#|\@[^i][^\{]|\&|\Z|(^[0-9]+[\'\u2019\u2032\u02CA\xb4]?\.))" | ||
r'(#|@[^i][^{]|&|\Z|(^[0-9]+' u'[\'\u2019\u2032\u02CA\xb4]\\.))' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is missing a ?
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely a mistake! Thanks for the thorough review. I'll come up with a test to get coverage on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some trouble coming up with a test for this. If I understand correctly, this pattern tries to match anything which would indicate a new section and thus it's time to leave the paragraph-parsing substate. So we have &
, #
sigils for new texts, processing directives or comments, empty lines, @
sigils with some exceptions (?) and then something that looks like a line number with an optional prime.
The problem is, I don't really understand what a 'paragraph' is in the context of the parser. I've tried adding and removing lines between line numbers but haven't seen a difference in output. Suggestions what the corresponding atf would look like are welcome! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the 'para' state only applies to #note
lines, which continue until a double-newline or the sigil for another data type. This is unlike #tr
lines which must be whitespace-indented to continue over multiple lines.
There is indeed no coverage for this, and silent failures with some of the sample corpus. I think I can write a regression test now.
The #note environment can be any free text up to a double newline. Subsequent lines don't need to be intented like with #tr. It didn't seem like we had text coverage for this feature.
Sanitizing the regular expression pattern strings, I accidentally removed the optional mark ('?') from the collection of prime characters used to conclude a #note when it doesn't end in a double newline. Restore this, and clarify the related comment.
Python 2.7 can't handle the unicode prime values in normal `str` objects, so construct the generated line label as a unicode object instead.
Ok @ageorgou, ready for follow-up! |
Thanks @rillian! I'll take a look and also double check with someone who has more ATF experience, so it will be early next week when I can review with confidence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a look now and it all seems fine to me. Thanks for the extra test, too. As I said above, I'll wait to check with someone who's more of an expert before approving. I have some minor comments, but those can also be changed by us later on.
brew upgrade python | ||
brew install python3 | ||
python3 --version | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I had posted this in an earlier review, oops.
Thanks for noticing that the build was taking a long time because of this!
I've experimented a bit, and we can get a newer environment (Python 3.7.3, pytest 4.6.3) by using a later Xcode image. This gives us (at the moment) more or less the same versions as homebrew, while still not delaying the build. Setting this up is simple, as shown here.
Note that a previous version (Xcode 10) gave a strange error (same as in #74), but maybe that was fixed in a minor pytest
update.
Considering that the Python from the base image (currently Xcode 9.4) is still relatively up-to-date, I suggest leaving this without specifying an image, for easier maintenance - but I wanted to comment on this somewhere!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your experimental branch with the newer macOS image. I came to the same conclusion about the default being easier, although travis isn't doing a great job either way. It would also be good to test on Python 3.7.
I think that should happen in a separate PR, though, so this one can remain focussed on cleaning up warnings in the test output.
u"4\u2032", | ||
u"5\u02CA", | ||
u"6\xb4"]: | ||
self.compare_note_ended_by_line(label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a helper function, I think that it would be a bit clearer to parametrise the test (essentially the code in compare_note_ended_by_line
) by the label. I may be missing something, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would. I tried to use @pytest.mark.parametrize
here like in test_atffile.py
, but couldn't get it to interact properly with the tests being class methods.
It looked like the unittest
functionality could be replaced by a pytest fixture, moving all the tests up to module level and saving an indent, but that's something which touches most of the file, so I thought it was better to leave that for a separate change.
I used this to guide constructing the new unit tests. Thanks to ageorgou for catching that I'd left it in the committed version.
Rename a varable to avoid confusion with a built-in. Thanks to ageorgou for the suggestion.
Thanks for the update. Hopefully all the additional issues are addressed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have double-checked with a local ATF user, and the note rules and tests make sense, so all looks good! Thanks again @rillian.
Great, thanks! \o/ Can you also merge the PR for me? |
Address the remaining escape sequence warnings, and replace deprecated
pep8
withpycodestyle
.Closes #78 .