Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python task execution can bomb with an unhandled OSError when interpreter cache goes stale. #3416

Closed
kwlzn opened this issue May 12, 2016 · 4 comments
Assignees
Labels

Comments

@kwlzn
Copy link
Member

kwlzn commented May 12, 2016

observed by one of our users while attempting a ./pants binary <target>:

13:33:53 00:00 [main]
               (To run a reporting server: ./pants server)
13:33:53 00:00   [setup]
13:33:53 00:00     [parse]
               Executing tasks in goals: bootstrap -> imports -> unpack-jars -> jvm-platform-validate -> deferred-sources -> compute-buildprops -> gen -> resolve -> resources -> buildprops -> compile -> binary
13:33:53 00:00   [bootstrap]
13:33:53 00:00     [jar-dependency-management]
13:33:54 00:01     [bootstrap-jvm-tools]
13:33:54 00:01     [thriftstore-dep-inject]
13:33:54 00:01     [validate-tags]
13:33:54 00:01     [validate-deps]
13:33:54 00:01     [validate-provides]
13:33:54 00:01   [imports]
13:33:54 00:01     [ivy-imports]
13:33:54 00:01   [unpack-jars]
13:33:54 00:01     [unpack-jars]
13:33:54 00:01   [jvm-platform-validate]
13:33:54 00:01     [jvm-platform-validate]
13:33:54 00:01   [deferred-sources]
13:33:54 00:01     [deferred-sources]
13:33:54 00:01   [compute-buildprops]
13:33:54 00:01     [compute-buildprops]
13:33:54 00:01   [gen]
13:33:54 00:01     [dataset]
13:33:54 00:01     [thrift-linter]
13:33:54 00:01     [thrift]
13:33:54 00:01     [protoc]
13:33:54 00:01     [antlr]
13:33:54 00:01     [ragel]
13:33:54 00:01     [jaxb]
13:33:54 00:01     [wire]
13:33:54 00:01     [go-thrift]
13:33:54 00:01     [scrooge]
13:33:54 00:01     [thriftstore-dml-gen]
13:33:54 00:01     [bootstrap]
13:33:54 00:01   [resolve]
13:33:54 00:01     [ivy]
13:33:54 00:01     [go]
13:33:54 00:01     [scala-js-compile]
13:33:54 00:01     [scala-js-link]
13:33:54 00:01     [node]
13:33:54 00:01   [resources]
13:33:54 00:01     [prepare]
13:33:54 00:01     [services]
13:33:54 00:01   [buildprops]
13:33:54 00:01     [binary-buildprops]
13:33:54 00:01   [compile]
13:33:54 00:01     [compile-jvm-prep-command]
13:33:54 00:01       [jvm_prep_command]
13:33:54 00:01     [compile-prep-command]
13:33:54 00:01     [compile]
13:33:54 00:01     [zinc]
13:33:54 00:01     [go]
13:33:54 00:01     [gofmt]
13:33:54 00:01     [python-eval]
13:33:54 00:01     [library-buildprops]
13:33:54 00:01   [binary]
13:33:54 00:01     [binary-jvm-prep-command]
13:33:54 00:01       [jvm_prep_command]
13:33:54 00:01     [binary-prep-command]
13:33:54 00:01     [python-binary-create]
13:33:54 00:01       [chroot]
               Waiting for background workers to finish.
13:33:55 00:02   [complete]
               FAILURE

Exception caught: (<type 'exceptions.OSError'>)
  File ".bootstrap/_pex/pex.py", line 317, in execute
    self._wrap_coverage(self._wrap_profiling, self._execute)
  File ".bootstrap/_pex/pex.py", line 250, in _wrap_coverage
    runner(*args)
  File ".bootstrap/_pex/pex.py", line 282, in _wrap_profiling
    runner(*args)
  File ".bootstrap/_pex/pex.py", line 360, in _execute
    return self.execute_entry(self._pex_info.entry_point)
  File ".bootstrap/_pex/pex.py", line 418, in execute_entry
    runner(entry_point)
  File ".bootstrap/_pex/pex.py", line 436, in execute_pkg_resources
    runner()
  File "/Users/$USER/.pex/install/$WHL/pants/bin/pants_exe.py", line 44, in main
    PantsRunner(exiter).run()
  File "/Users/$USER/.pex/install/$WHL/pants/bin/pants_runner.py", line 53, in run
    options_bootstrapper=options_bootstrapper)
  File "/Users/$USER/.pex/install/$WHL/pants/bin/pants_runner.py", line 43, in _run
    return LocalPantsRunner(exiter, args, env, options_bootstrapper=options_bootstrapper).run()
  File "/Users/$USER/.pex/install/$WHL/pants/bin/local_pants_runner.py", line 49, in run
    self._maybe_profiled(self._run)
  File "/Users/$USER/.pex/install/$WHL/pants/bin/local_pants_runner.py", line 46, in _maybe_profiled
    runner()
  File "/Users/$USER/.pex/install/$WHL/pants/bin/local_pants_runner.py", line 89, in _run
    result = goal_runner.run()
  File "/Users/$USER/.pex/install/$WHL/pants/bin/goal_runner.py", line 335, in run
    result = self._execute_engine()
  File "/Users/$USER/.pex/install/$WHL/pants/bin/goal_runner.py", line 324, in _execute_engine
    result = engine.execute(self._context, self._goals)
  File "/Users/$USER/.pex/install/$WHL/pants/engine/engine.py", line 26, in execute
    self.attempt(context, goals)
  File "/Users/$USER/.pex/install/$WHL/pants/engine/round_engine.py", line 224, in attempt
    goal_executor.attempt(explain)
  File "/Users/$USER/.pex/install/$WHL/pants/engine/round_engine.py", line 47, in attempt
    task.execute()
  File "/Users/$USER/.pex/install/$WHL/pants/backend/python/tasks/python_binary_create.py", line 39, in execute
    self.create_binary(binary)
  File "/Users/$USER/.pex/install/$WHL/pants/backend/python/tasks/python_binary_create.py", line 52, in create_binary
    platforms=binary.platforms) as chroot:
  File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/Users/$USER/.pex/install/$WHL/pants/backend/python/tasks/python_task.py", line 175, in temporary_chroot
    extra_requirements, executable_file_content)
  File "/Users/$USER/.pex/install/$WHL/pants/backend/python/tasks/python_task.py", line 198, in _build_chroot
    builder.freeze()
  File "/Users/$USER/.pex/install/$PEX_WHL/pex/pex_builder.py", line 408, in freeze
    self._precompile_source()
  File "/Users/$USER/.pex/install/$PEX_WHL/pex/pex_builder.py", line 332, in _precompile_source
    compiled_relpaths = compiler.compile(self._chroot.path(), source_relpaths)
  File "/Users/$USER/.pex/install/$PEX_WHL/pex/compiler.py", line 86, in compile
    stderr=subprocess.PIPE)
  File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception

Exception message: [Errno 2] No such file or directory

which ended up being the attempted execution of a dangling symlink to the python interpreter as stored in pants' interpreter cache - along the lines of:

>>> os.path.exists(os.readlink('python'))
False
>>> subprocess.Popen(['./python', 'blah'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/opt/twitter_mde/package/python2.7/current/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

the workaround was a ./pants clean-all which removes the interpreter cache and allows for a clean rebuild without the dangling symlinks.

I'm thinking this needs two bits of love:

  1. this particular subprocess.Popen call in pex's compiler.py and potentially wherever else this pattern is repeated should trap OSError and surface a more meaningful exception with the execution failure context.

  2. we should do a better job of validating and pruning the interpreter cache run over run to avoid staleness that can cause this in the first place.

@kwlzn
Copy link
Member Author

kwlzn commented May 20, 2016

part 1 is reviewable here (pex uses the PR workflow): pex-tool/pex#255

@kwlzn kwlzn added the fix-it label Jul 13, 2016
@kwlzn
Copy link
Member Author

kwlzn commented Jul 13, 2016

#255 has landed in pex master.

@nsaechao
Copy link
Contributor

Pants caches the python interpreter by symlinking the interpreter from a known (trusted) location in the local file system. This is an issue if the local python interpreter is also a symlink to another location. Pants doesn't know about the change and has no runtime check to validate and fix the interpreter cache. This tricks pants into thinking there doesn't exist an available python interpreter even when it is simply untrue. The end result is an unstable / non-working pants. The solution is to hand-massage the situation by either a ./pants clean-all or a manual cleanup of the pants python interpreter cache directory.

@CMLivingston
Copy link
Contributor

Closed by #7225.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants