-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import statements can be banned in BUILD files #5180
Import statements can be banned in BUILD files #5180
Conversation
@illicitonion : I believe that all global options are fingerprinted to trigger daemon restarts via #5045, but @kwlzn should be able to confirm. |
Would it be more general to have a list of banned tokens maybe? |
@stuhood If we're just trying to attack import, I think that banning tokens is about as safe and general as hooking import, but noticeably harder; off the top of my head, trying to ban import by token would be regexing over the input python for '[^:valid_in_variable_names:]import[^:valid_in_variable_names:]' or anything which could be constructed to be 'import' (such as '' + 'import'). Are there other tokens you'd like to ban, which would make token search a better approach? |
|
||
|
||
def _fail_on_import(name, *args): | ||
raise Exception('import statements have been banned, but tried to import {}'.format(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a more relevant exception type that could be used here and above vs Exception
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very probably... pants.engine.parser.ParseError
, pants.build_graph.build_file_parser.ParseError
, pants.base.exceptions.TargetDefinitionException
, or a new subclass of pants.base.build_file.BuildFileError
all seem like viable options.
I'm guessing pants.engine.parser.ParseError
is probably the winner, because this file already uses things in that module? I've switched to that one for now...
@@ -0,0 +1,3 @@ | |||
from __future__ import print_function | |||
|
|||
print("Hello") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go under testprojects/src/python/...
instead? src/java
is unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a more general and vague rule, like genrule
, rather than me needing to arbitrarily pick python as a language to write hello world in? Because really, this is very language agnostic...
But either way, moved.
|
||
python_binary( | ||
name = os.path.join("hello"), | ||
source = "hello.py", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space indent is the repo standard here too, I believe.. but maybe we're just going to be auto-formatting these soon anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
options, _ = OptionsInitializer(OptionsBootstrapper()).setup() | ||
global_options = options.for_global_scope() | ||
if global_options.build_file_imports != 'allow': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this strikes me as odd to do on every .parse()
.. but maybe it's cached somehow as global state?
either way, how about plumbing the value of global_options.build_file_imports
here as an instance attribute and letting an external caller handle the options parsing bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the caller, which is already doing this lookup. If there's a more cached/singeltony way of doing this, I'd love to see it, this was just the code I saw being used for this purpose in a bunch of other places...
else: | ||
raise Exception("Didn't know what to do for build_file_imports value {}".format(global_options.build_file_imports)) | ||
builtins['__import__'] = import_hook | ||
symbols['__builtins__'] = builtins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to not do this injection as part of self._generate_symbols()
when self._symbols
is computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need filepath to give a good error message in the warn case.
acbb4a9
to
7b45c83
Compare
bootstrap_options_values = bootstrap_options.for_global_scope() | ||
|
||
options, build_config = OptionsInitializer(options_bootstrapper).setup(init_logging=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this will re-force options parsing... we're already passing in the bootstrap_options.
Would it be possible to just define the new option as a bootstrap option instead...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(...yes, I think so. Let's do that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a bootstrap option, reverted all of the change around option parsing, I'll send that as a separate PR because I think there's stuff to be cleaned up there, but that should be separate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the same kwarg for both import behavior flag? build_file_imports_behavior
vs build_file_import_behavior
is confusing to me. I think build_file_imports_behavior
makes sense since build_file_imports
is the name of the option.
@@ -16,7 +16,7 @@ class LegacyPythonCallbacksParserTest(unittest.TestCase): | |||
|
|||
def test_no_import_sideeffects(self): | |||
# A parser with no symbols registered. | |||
parser = LegacyPythonCallbacksParser(EmptyTable(), BuildFileAliases()) | |||
parser = LegacyPythonCallbacksParser(EmptyTable(), BuildFileAliases(), 'allow') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you use the kwarg here? I think build_file_imports_behavior='allow'
would be a lot clearer to someone looking at this later.
]) | ||
self.assert_success(pants_run) | ||
self.assertIn('Hello\n', pants_run.stdout_data) | ||
self.assertIn('testprojects/src/python/build_file_imports_module/BUILD', pants_run.stderr_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the test_banned_module_import
should assert this too, since it triggers a different execution path.
Would love to have this. I expect that if you move that option to the bootstrap options it will fix most/all of the test issues. |
504f3de
to
9872c4d
Compare
Addressed all comments, PTAL @stuhood @baroquebobcat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but would be nice not to reexecute the setup code for every parse.
six.exec_(python, dict(self._symbols)) | ||
|
||
symbols = dict(self._symbols) | ||
if self._build_file_import_behavior != 'allow': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this whole block should be in def _generate_symbols
in order to avoid running it for every parse?
You won't be able to get access to the complete filepath, but you should be able to lazily access ParseContext.rel_path
, which is mutated for each parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. ParseContext.rel_path
unfortunately is just the directory, not the file, which makes the warning slightly less useful... Added "in directory" to the warning message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add another (private) field to the ParseContext
if it helps.
return import_builtin(import_name, *args) | ||
|
||
|
||
def _fail_on_import(name, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is less useful than the other one because it doesn't include the bad file path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests assert that the file name is included in the error message.
$ ./pants --build-file-imports=error --print-exception-stacktrace=false run testprojects/src/python/build_file_imports_module:hello
Exception caught: (<class 'pants.build_graph.address_lookup_error.AddressLookupError'>)
Exception message: Build graph construction failed: ExecutionError Multiple exceptions encountered:
MappingError: Failed to parse testprojects/src/python/build_file_imports_module/BUILD:
import statements have been banned, but tried to import os.path
MappingError: Failed to parse testprojects/src/python/build_file_imports_module/BUILD:
import statements have been banned, but tried to import os.path
(Any idea why it's double-printing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because there were multiple roots, and they failed with the same error. Needs work. I'm going to try and convince Emily to work on #3695 soon, heh.
@@ -216,6 +216,11 @@ def register_bootstrap_options(cls, register): | |||
help='The path to the watchman UNIX socket. This can be overridden if the default ' | |||
'absolute path length exceeds the maximum allowed by the OS.') | |||
|
|||
# This option changes the parser behavior in a fundamental way (which currently invalidates | |||
# all caches), and needs to be parsed out early, so we make it a bootstrap option. | |||
register('--build-file-imports', choices=['allow', 'warn', 'error'], default='allow', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's default to warning for this so we can start recommending evolution away from imports now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
The default is to allow them. This allows greater control over the complexity, reproducibility, and speed of BUILD file parsing. It also opens up potential future opimisations such as parsing with a custom parser rather than a whole python environment.
9872c4d
to
558a4de
Compare
A new testproject intentionally has warnings when running by default.
@@ -168,14 +168,18 @@ def test_pantsd_run(self): | |||
pantsd_run(['list', '3rdparty:']) | |||
checker.await_pantsd() | |||
|
|||
# Some testprojects have warnings or errors, so skip them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, a TODO referencing a ticket would be good here.
@illicitonion would be good to test this in source before landing for any impact to list perf - I appear to be seeing a (functional) regression for |
@kwlzn Will do! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, mod a concern about test coverage reduction + the breakage in source.
pantsd_run(['list', ':']) | ||
checker.assert_running() | ||
|
||
pantsd_run(['list', '::']) | ||
pantsd_run(['list'] + all_targets_except_testprojects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like an undesirable coverage regression from pantsd's perspective to me. could you not just plumb build_file_imports_behavior='allow'
as the default for these tests, which shouldn't warn or throw errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly do that. But other checks which we have in the repo explicitly exempt testprojects because we don't expect them to necessarily work or be valid. Given that, it seems that exempting them here is worthwhile, because otherwise next time someone adds an intentionally broken testproject, they'll need to touch this test again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's highly desirable to be able to write integration tests that are invalid. It's also highly desirable for running ./pants list ::
in the repo to not trigger warnings.
In general, to avoid having integration tests trigger warnings or failures, we've fiddled around with having non-BUILD files that are converted into BUILD files, and etc. That's slightly awkward, but it's also a potential solution here:
pants/tests/python/pants_test/backend/jvm/subsystems/test_jar_dependency_management_integration.py
Lines 28 to 31 in d30cca1
@contextmanager | |
def _testing_build_file(self): | |
with self.file_renamed(self.project, 'TEST_BUILD', 'BUILD'): | |
yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about leaving the list ::
but just checking for errors (E
) vs warnings + errors (WE
)?
the warnings check is just a historical remnant I believe, I'm not sure it buys us anything meaningful here today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I'm going to default this to allow for now; we can have a discussion about changing that default in the future. It turns out that "restricted mode" is a thing; if 1: https://github.com/python/cpython/blob/2.7/Include/frameobject.h#L58 |
It turns out that restricted mode is slow.
Thanks @kwlzn for the diligence. It turns out that the moment you're running in restricted mode, everything runs slower (I guess there are some handy optimisations which get disabled). Re-implemented as a tokenizer rather than import hook. Ran some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the readline
call might be a bug, but if I'm just misreading, then feel free to shipit.
# But it's sufficient to tell most users who aren't being actively malicious that they're doing | ||
# something wrong, and it has a low performance overhead. | ||
if self._build_file_imports_behavior != 'allow' and 'import' in python: | ||
for token in tokenize.generate_tokens(StringIO(python).readline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only going to read the first line of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate_tokens takes a function to read lines; note it's readline
not readline()
:)
@@ -216,6 +216,11 @@ def register_bootstrap_options(cls, register): | |||
help='The path to the watchman UNIX socket. This can be overridden if the default ' | |||
'absolute path length exceeds the maximum allowed by the OS.') | |||
|
|||
# This option changes the parser behavior in a fundamental way (which currently invalidates | |||
# all caches), and needs to be parsed out early, so we make it a bootstrap option. | |||
register('--build-file-imports', choices=['allow', 'warn', 'error'], default='warn', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you've gone through the trouble to tokenize (thanks!), how would you feel about having two options here:
--build-file-tokens-warn=['while']
--build-file-tokens-error=['import']
?
If we think that's overkill, then a-ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to save this for the future; it's just cosmetic to retrofit this, but I'd want a more thorough set of tests before exposing arbitrary tokens through a public interface.
Also, I really would like to set a tone moving forward by defaulting |
Leaving the default as warn :) Merging... |
…option `allow` (#9047) ### Problem Imports are not safe in BUILD files, as described in: #9040 (comment) We did not have a mechanism to warn about them until December 2017 via #5180, so we seem to have only maintained the misfeature due to backward compatibility. ### Solution We want to remove support for them, but most do so incrementally per the deprecation policy. First, we deprecate `--build-file-imports=allow` and at the same time deprecate the default being `warn` instead of `error`. Then, in 1.27.0.dev0, we will change the default to `error` and deprecate the option entirely. ### Result We will have remove a whole class of bugs for users. We'll also be able to make more assumptions about BUILD files. From #5180: > It also opens up potential future opimisations such as parsing with a custom parser rather than a whole python environment.
The default is to allow them.
This allows greater control over the complexity, reproducibility, and
speed of BUILD file parsing.
It also opens up potential future opimisations such as parsing with a
custom parser rather than a whole python environment.
I'm not sure how to hook this option up to make changing it invalidate BUILD file parse caches in the daemon... Pointers would be great!