Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import statements can be banned in BUILD files #5180

Merged
merged 6 commits into from
Dec 20, 2017

Conversation

illicitonion
Copy link
Contributor

The default is to allow them.

This allows greater control over the complexity, reproducibility, and
speed of BUILD file parsing.

It also opens up potential future opimisations such as parsing with a
custom parser rather than a whole python environment.

I'm not sure how to hook this option up to make changing it invalidate BUILD file parse caches in the daemon... Pointers would be great!

@stuhood
Copy link
Sponsor Member

stuhood commented Dec 6, 2017

@illicitonion : I believe that all global options are fingerprinted to trigger daemon restarts via #5045, but @kwlzn should be able to confirm.

@stuhood
Copy link
Sponsor Member

stuhood commented Dec 6, 2017

Would it be more general to have a list of banned tokens maybe?

@illicitonion
Copy link
Contributor Author

@stuhood If we're just trying to attack import, I think that banning tokens is about as safe and general as hooking import, but noticeably harder; off the top of my head, trying to ban import by token would be regexing over the input python for '[^:valid_in_variable_names:]import[^:valid_in_variable_names:]' or anything which could be constructed to be 'import' (such as '' + 'import').

Are there other tokens you'd like to ban, which would make token search a better approach?



def _fail_on_import(name, *args):
raise Exception('import statements have been banned, but tried to import {}'.format(name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a more relevant exception type that could be used here and above vs Exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very probably... pants.engine.parser.ParseError, pants.build_graph.build_file_parser.ParseError, pants.base.exceptions.TargetDefinitionException, or a new subclass of pants.base.build_file.BuildFileError all seem like viable options.

I'm guessing pants.engine.parser.ParseError is probably the winner, because this file already uses things in that module? I've switched to that one for now...

@@ -0,0 +1,3 @@
from __future__ import print_function

print("Hello")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go under testprojects/src/python/... instead? src/java is unexpected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a more general and vague rule, like genrule, rather than me needing to arbitrarily pick python as a language to write hello world in? Because really, this is very language agnostic...

But either way, moved.


python_binary(
name = os.path.join("hello"),
source = "hello.py",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 space indent is the repo standard here too, I believe.. but maybe we're just going to be auto-formatting these soon anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


options, _ = OptionsInitializer(OptionsBootstrapper()).setup()
global_options = options.for_global_scope()
if global_options.build_file_imports != 'allow':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this strikes me as odd to do on every .parse().. but maybe it's cached somehow as global state?

either way, how about plumbing the value of global_options.build_file_imports here as an instance attribute and letting an external caller handle the options parsing bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the caller, which is already doing this lookup. If there's a more cached/singeltony way of doing this, I'd love to see it, this was just the code I saw being used for this purpose in a bunch of other places...

else:
raise Exception("Didn't know what to do for build_file_imports value {}".format(global_options.build_file_imports))
builtins['__import__'] = import_hook
symbols['__builtins__'] = builtins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not do this injection as part of self._generate_symbols() when self._symbols is computed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need filepath to give a good error message in the warn case.

bootstrap_options_values = bootstrap_options.for_global_scope()

options, build_config = OptionsInitializer(options_bootstrapper).setup(init_logging=False)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this will re-force options parsing... we're already passing in the bootstrap_options.

Would it be possible to just define the new option as a bootstrap option instead...?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(...yes, I think so. Let's do that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a bootstrap option, reverted all of the change around option parsing, I'll send that as a separate PR because I think there's stuff to be cleaned up there, but that should be separate :)

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the same kwarg for both import behavior flag? build_file_imports_behavior vs build_file_import_behavior is confusing to me. I think build_file_imports_behavior makes sense since build_file_imports is the name of the option.

@@ -16,7 +16,7 @@ class LegacyPythonCallbacksParserTest(unittest.TestCase):

def test_no_import_sideeffects(self):
# A parser with no symbols registered.
parser = LegacyPythonCallbacksParser(EmptyTable(), BuildFileAliases())
parser = LegacyPythonCallbacksParser(EmptyTable(), BuildFileAliases(), 'allow')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you use the kwarg here? I think build_file_imports_behavior='allow' would be a lot clearer to someone looking at this later.

])
self.assert_success(pants_run)
self.assertIn('Hello\n', pants_run.stdout_data)
self.assertIn('testprojects/src/python/build_file_imports_module/BUILD', pants_run.stderr_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the test_banned_module_import should assert this too, since it triggers a different execution path.

@stuhood
Copy link
Sponsor Member

stuhood commented Dec 16, 2017

Would love to have this. I expect that if you move that option to the bootstrap options it will fix most/all of the test issues.

@illicitonion
Copy link
Contributor Author

Addressed all comments, PTAL @stuhood @baroquebobcat

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but would be nice not to reexecute the setup code for every parse.

six.exec_(python, dict(self._symbols))

symbols = dict(self._symbols)
if self._build_file_import_behavior != 'allow':
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this whole block should be in def _generate_symbols in order to avoid running it for every parse?

You won't be able to get access to the complete filepath, but you should be able to lazily access ParseContext.rel_path, which is mutated for each parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. ParseContext.rel_path unfortunately is just the directory, not the file, which makes the warning slightly less useful... Added "in directory" to the warning message.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add another (private) field to the ParseContext if it helps.

return import_builtin(import_name, *args)


def _fail_on_import(name, *args):
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is less useful than the other one because it doesn't include the bad file path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests assert that the file name is included in the error message.

$ ./pants --build-file-imports=error --print-exception-stacktrace=false run testprojects/src/python/build_file_imports_module:hello
Exception caught: (<class 'pants.build_graph.address_lookup_error.AddressLookupError'>)

Exception message: Build graph construction failed: ExecutionError Multiple exceptions encountered:
  MappingError: Failed to parse testprojects/src/python/build_file_imports_module/BUILD:
import statements have been banned, but tried to import os.path
  MappingError: Failed to parse testprojects/src/python/build_file_imports_module/BUILD:
import statements have been banned, but tried to import os.path

(Any idea why it's double-printing?)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because there were multiple roots, and they failed with the same error. Needs work. I'm going to try and convince Emily to work on #3695 soon, heh.

@@ -216,6 +216,11 @@ def register_bootstrap_options(cls, register):
help='The path to the watchman UNIX socket. This can be overridden if the default '
'absolute path length exceeds the maximum allowed by the OS.')

# This option changes the parser behavior in a fundamental way (which currently invalidates
# all caches), and needs to be parsed out early, so we make it a bootstrap option.
register('--build-file-imports', choices=['allow', 'warn', 'error'], default='allow',
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's default to warning for this so we can start recommending evolution away from imports now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

The default is to allow them.

This allows greater control over the complexity, reproducibility, and
speed of BUILD file parsing.

It also opens up potential future opimisations such as parsing with a
custom parser rather than a whole python environment.
A new testproject intentionally has warnings when running by default.
@@ -168,14 +168,18 @@ def test_pantsd_run(self):
pantsd_run(['list', '3rdparty:'])
checker.await_pantsd()

# Some testprojects have warnings or errors, so skip them.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, a TODO referencing a ticket would be good here.

@kwlzn
Copy link
Member

kwlzn commented Dec 19, 2017

@illicitonion would be good to test this in source before landing for any impact to list perf - I appear to be seeing a (functional) regression for ./pants list ::.

@illicitonion
Copy link
Contributor Author

@kwlzn Will do!

Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, mod a concern about test coverage reduction + the breakage in source.

pantsd_run(['list', ':'])
checker.assert_running()

pantsd_run(['list', '::'])
pantsd_run(['list'] + all_targets_except_testprojects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like an undesirable coverage regression from pantsd's perspective to me. could you not just plumb build_file_imports_behavior='allow' as the default for these tests, which shouldn't warn or throw errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly do that. But other checks which we have in the repo explicitly exempt testprojects because we don't expect them to necessarily work or be valid. Given that, it seems that exempting them here is worthwhile, because otherwise next time someone adds an intentionally broken testproject, they'll need to touch this test again.

@kwlzn @stuhood WDYT?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's highly desirable to be able to write integration tests that are invalid. It's also highly desirable for running ./pants list :: in the repo to not trigger warnings.

In general, to avoid having integration tests trigger warnings or failures, we've fiddled around with having non-BUILD files that are converted into BUILD files, and etc. That's slightly awkward, but it's also a potential solution here:

@contextmanager
def _testing_build_file(self):
with self.file_renamed(self.project, 'TEST_BUILD', 'BUILD'):
yield

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about leaving the list :: but just checking for errors (E) vs warnings + errors (WE)?

the warnings check is just a historical remnant I believe, I'm not sure it buys us anything meaningful here today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@illicitonion
Copy link
Contributor Author

I'm going to default this to allow for now; we can have a discussion about changing that default in the future.

It turns out that "restricted mode" is a thing; if __builins__ is not the same as the global __builtins__ [1] (as is the case here), some constructs stop being available to the interpreter. In particular, the file() constructor is banned, so no files can be opened if __builtins__ is changed. While this is probably good (people should definitely not be opening files in BUILD files), it's a more breaking change than I'm comfortable quietly making.

1: https://github.com/python/cpython/blob/2.7/Include/frameobject.h#L58

@illicitonion
Copy link
Contributor Author

Thanks @kwlzn for the diligence. It turns out that the moment you're running in restricted mode, everything runs slower (I guess there are some handy optimisations which get disabled).

Re-implemented as a tokenizer rather than import hook.

Ran some ./pants list ::s over a large repo to check we're not regressing any more.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the readline call might be a bug, but if I'm just misreading, then feel free to shipit.

# But it's sufficient to tell most users who aren't being actively malicious that they're doing
# something wrong, and it has a low performance overhead.
if self._build_file_imports_behavior != 'allow' and 'import' in python:
for token in tokenize.generate_tokens(StringIO(python).readline):
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only going to read the first line of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_tokens takes a function to read lines; note it's readline not readline() :)

@@ -216,6 +216,11 @@ def register_bootstrap_options(cls, register):
help='The path to the watchman UNIX socket. This can be overridden if the default '
'absolute path length exceeds the maximum allowed by the OS.')

# This option changes the parser behavior in a fundamental way (which currently invalidates
# all caches), and needs to be parsed out early, so we make it a bootstrap option.
register('--build-file-imports', choices=['allow', 'warn', 'error'], default='warn',
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you've gone through the trouble to tokenize (thanks!), how would you feel about having two options here:

--build-file-tokens-warn=['while']
--build-file-tokens-error=['import']

?

If we think that's overkill, then a-ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to save this for the future; it's just cosmetic to retrofit this, but I'd want a more thorough set of tests before exposing arbitrary tokens through a public interface.

@stuhood
Copy link
Sponsor Member

stuhood commented Dec 20, 2017

Also, I really would like to set a tone moving forward by defaulting import to warn.

@illicitonion
Copy link
Contributor Author

Leaving the default as warn :) Merging...

@illicitonion illicitonion merged commit 6052b2a into pantsbuild:master Dec 20, 2017
@illicitonion illicitonion deleted the dwagnerhall/importerrors branch February 26, 2018 11:29
Eric-Arellano added a commit that referenced this pull request Feb 1, 2020
…option `allow` (#9047)

### Problem
Imports are not safe in BUILD files, as described in:

#9040 (comment)

We did not have a mechanism to warn about them until December 2017 via #5180, so we seem to have only maintained the misfeature due to backward compatibility. 

### Solution
We want to remove support for them, but most do so incrementally per the deprecation policy. First, we deprecate `--build-file-imports=allow` and at the same time deprecate the default being `warn` instead of `error`. Then, in 1.27.0.dev0, we will change the default to `error` and deprecate the option entirely.

### Result
We will have remove a whole class of bugs for users. We'll also be able to make more assumptions about BUILD files. From #5180:

> It also opens up potential future opimisations such as parsing with a
custom parser rather than a whole python environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants