Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port from cffi to the cpython crate. #9593

Merged
merged 3 commits into from
May 31, 2020

Conversation

stuhood
Copy link
Sponsor Member

@stuhood stuhood commented Apr 19, 2020

Problem

When initially porting the engine to rust, we used cffi because it had a mode that avoided the need to actually link against Python. But somewhere along the way, we began linking against Python, and needed to add C shims to expose a proper module. Though complex, it has served us well for a few years.

But during that same time, Rust python-integration crates like cpython have matured, providing significant safety and usability benefits over a raw C API.

Solution

Port to the cpython crate. When compared to usage of CFFI, this eliminates the passing of raw pointers back and forth across the ffi boundary (improving errors), and allows for direct access to Python objects from Rust (while the GIL is acquired). The cpython crate was chosen over PyO3 because it doesn't require a nightly Rust compiler, and is vouched for by the mercurial developers.

The cpython crate does not use PEP-384, and updating it to do so would be a significant undertaking: for example, the PyObject::call interface relies on PyTuple, which is not present under PEP-384. It would likely also erase some of the performance and usability benefit of this API. So in order to switch to this crate, @Eric-Arellano did a ton of pre-work (culminating in #9881) to switch away from publishing ABI3 wheels, and toward publishing a wheel-per-python3.

Result

./pants list :: is 8-10% faster, usage is easier and safer, and there is less code to maintain. Additionally, since pyoxidizer (see #7369) relies on the cpython crate, this helps to unblock potentially switching to distributing Pants as a native binary with an embedded Python.

@stuhood stuhood force-pushed the stuhood/cpython-crate branch 2 times, most recently from ce80b6b to e38e759 Compare April 20, 2020 04:21
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic! Have you done any profiling to check if this impacts performance?

src/python/pants/engine/native.py Outdated Show resolved Hide resolved
@@ -132,45 +128,13 @@ def __init__(

def graph_trace(self, session, execution_request):
with temporary_file_path() as path:
self._native.lib.graph_trace(self._scheduler, session, execution_request, path.encode())
self._native.lib.graph_trace(self._scheduler, session, execution_request, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, so it's safe to serialize unicode? Neat!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust and Python3 both use utf8-encoded unicode strings as their main string primitives (which is a very sane decision), so I'm not surprised this could be made to work automatically.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not inherently zero copy (some conversion still happens at the call boundary), but it is possible to do some things as zero copy by accepting a PyString on the rust side, and then temporarily viewing it as a Cow<str> (copy on write &str): https://docs.rs/cpython/0.5.0/cpython/struct.PyString.html#method.to_string

src/python/pants/engine/scheduler.py Outdated Show resolved Hide resolved
src/python/pants/engine/scheduler.py Outdated Show resolved Hide resolved
src/rust/engine/src/context.rs Show resolved Hide resolved
src/rust/engine/src/core.rs Show resolved Hide resolved
src/python/pants/engine/native.py Outdated Show resolved Hide resolved
@@ -132,45 +128,13 @@ def __init__(

def graph_trace(self, session, execution_request):
with temporary_file_path() as path:
self._native.lib.graph_trace(self._scheduler, session, execution_request, path.encode())
self._native.lib.graph_trace(self._scheduler, session, execution_request, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust and Python3 both use utf8-encoded unicode strings as their main string primitives (which is a very sane decision), so I'm not surprised this could be made to work automatically.

@stuhood stuhood force-pushed the stuhood/cpython-crate branch 3 times, most recently from 82a10c8 to 396e3c0 Compare April 23, 2020 23:31
stuhood added a commit that referenced this pull request Apr 24, 2020
### Problem

Two tests are failing on master, one of which affected #9593, and both of which repro locally. Not sure if this is Python version differences.

[ci skip-rust-tests]
[ci skip-jvm-tests]
@stuhood stuhood force-pushed the stuhood/cpython-crate branch 2 times, most recently from be78faf to 284d590 Compare April 24, 2020 14:49
@stuhood stuhood marked this pull request as draft April 24, 2020 20:26
@stuhood
Copy link
Sponsor Member Author

stuhood commented Apr 24, 2020

I've updated the description to cover some other benefits and downsides. I'll continue working on this in the background, but we'll likely want to drop Centos6 before attempting to ship more wheels: see #7288.

@stuhood stuhood marked this pull request as ready for review May 31, 2020 00:53
@stuhood
Copy link
Sponsor Member Author

stuhood commented May 31, 2020

I was able to fix the rust tests shard, and thanks to all of @Eric-Arellano's hard work this is now unblocked and reviewable. Thanks!

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Great work!

Comment on lines +29 to +30
// NB: When built with Python 3, `native_engine.so` only works with a Python 3 interpreter.
// When built with Python 2, it works with both Python 2 and Python 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale. We always use Py3, but it is true that Py3.6 != Py3.7 etc.

Value(Arc::new(handle))
}

// NB: Longer name because overloaded in a few places.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a longer name?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consume_into_py_object

@@ -0,0 +1,1608 @@
// Copyright 2020 Pants project contributors (see CONTRIBUTORS.md).
// Licensed under the Apache License, Version 2.0 (see LICENSE).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not generated, right? It's pretty big so didn't do a close review. Looks good from a skim. Lmk if I should ready closely.

Copy link
Sponsor Member Author

@stuhood stuhood May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file formerly known as engine_cffi/src/lib.rs: if you look at the commits independently, you can mostly ignore the second one, as it's a rename.

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a nicer API performance completely aside.

@stuhood stuhood merged commit 9f2c65b into pantsbuild:master May 31, 2020
@stuhood stuhood deleted the stuhood/cpython-crate branch May 31, 2020 05:36
Eric-Arellano added a commit that referenced this pull request Jun 8, 2021
## Rust-cpython vs PyO3

While developing my talk on Rust native extensions, I found PyO3 has had lots of traction recently, including Python cryptography now using it. PyO3 started as a fork, but has since diverged: https://pyo3.rs/v0.13.2/rust_cpython.html.

We went with rust-cpython instead of PyO3 for Rust FFI in #9593 because, at the time, PyO3 required the Rust nightly compiler. While this was a huge win over CFFI, PyO3 would now bring us several benefits:

* Uses procedural macros, rather than a custom `macro_rules!` DSL. This is more natural to work with and better understood by IDEs, rustfmt, etc.
* Great documentation, including a book: https://pyo3.rs/v0.13.2/index.html.
* Better error handling, especially due to not requiring `Python`. See https://pyo3.rs/v0.13.2/rust_cpython.html#error-handling.
~* Can use the Python stable ABI again, meaning we only need to build one wheel per platform, rather than platform x interpreter. (Altho we would need to stop using `PyBuffer` API.) See https://pyo3.rs/v0.13.2/building_and_distribution.html#py_limited_apiabi3.~ We're unlikely to use this due to performance hit.

The community has been very responsive too: PyO3/pyo3#1625, and they have an active Gitter room.

## Incremental migration

To reduce risk and complexity, we use an incremental migration to PyO3 by building two distinct native extensions. At first, `native_engine_pyo3` will only have self-contained functionality, e.g. `PyStubCAS` and, soon, the Nailgun server.

Because the migration is incremental, we can more safely improve our FFI implementation along the way, rather than merely preserving the status quo. For example, this PR makes the `PyStubCAS` API more ergnonomic. 

When we reach critical mass with PyO3, `native_engine_pyo3` will be renamed to `native_engine` and `native_engine` will be changed to `native_engine_cpython`. (Or, we'll remove rust-cpython in one big swoop). This will result in some churn in our Python files (due to import module changing), but I argue that's worth the reduction in risk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants