Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] Port match_path_globs to PyO3 #12225

Merged
merged 2 commits into from
Jul 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions src/python/pants/engine/internals/native_engine.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ from typing import Any, Sequence, TextIO

from typing_extensions import Protocol

from pants.engine.fs import PathGlobs
from pants.engine.internals.scheduler import Workunit, _PathGlobsAndRootCollection
from pants.engine.internals.session import SessionValues
from pants.engine.process import InteractiveProcess, InteractiveProcessResult
Expand All @@ -30,7 +29,6 @@ class RawFdRunner(Protocol):
stderr_fileno: int,
) -> int: ...

def match_path_globs(path_globs: PathGlobs, paths: tuple[str, ...]) -> str: ...
def capture_snapshots(
scheduler: PyScheduler,
session: PySession,
Expand Down
4 changes: 4 additions & 0 deletions src/python/pants/engine/internals/native_engine_pyo3.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,14 @@

from __future__ import annotations

from pants.engine.fs import PathGlobs

# TODO: black and flake8 disagree about the content of this file:
# see https://github.com/psf/black/issues/1548
# flake8: noqa: E302

def match_path_globs(path_globs: PathGlobs, paths: tuple[str, ...]) -> str: ...

class PyNailgunClient:
def __init__(self, port: int, executor: PyExecutor) -> None: ...
def execute(self, command: str, args: list[str], env: dict[str, str]) -> int: ...
Expand Down
2 changes: 1 addition & 1 deletion src/python/pants/source/filespec.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from typing_extensions import TypedDict

from pants.engine.fs import PathGlobs
from pants.engine.internals import native_engine
from pants.engine.internals import native_engine_pyo3 as native_engine


class _IncludesDict(TypedDict, total=True):
Expand Down
1 change: 1 addition & 0 deletions src/rust/engine/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions src/rust/engine/engine_pyo3/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ extension-module = ["pyo3/extension-module"]
default = []

[dependencies]
fs = { path = "../fs" }
hashing = { path = "../hashing" }
nailgun = { path = "../nailgun" }
parking_lot = "0.11"
Expand Down
2 changes: 2 additions & 0 deletions src/rust/engine/engine_pyo3/src/externs/interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@
use pyo3::exceptions::PyException;
use pyo3::prelude::*;

mod fs;
mod nailgun;
mod testutil;

#[pymodule]
fn native_engine_pyo3(py: Python, m: &PyModule) -> PyResult<()> {
self::fs::register(m)?;
self::nailgun::register(py, m)?;
self::testutil::register(m)?;

Expand Down
74 changes: 74 additions & 0 deletions src/rust/engine/engine_pyo3/src/externs/interface/fs.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).
// Licensed under the Apache License, Version 2.0 (see LICENSE).

use std::path::Path;

use fs::{GlobExpansionConjunction, PathGlobs, PreparedPathGlobs, StrictGlobMatching};
use pyo3::exceptions::PyValueError;
use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

pub(crate) fn register(m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(match_path_globs, m)?)?;
Ok(())
}

struct PyPathGlobs(PathGlobs);

impl PyPathGlobs {
fn parse(self) -> PyResult<PreparedPathGlobs> {
self.0.clone().parse().map_err(|e| {
Copy link
Contributor Author

@Eric-Arellano Eric-Arellano Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! This clone() might explain the slowdown in the contrived benchmark. We clone an Arc instead with Rust-CPython.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that's not it. And the PathGlobs is cheap in the example. It's the paths: Vec<String> that's so huge and would be painful to clone.

Copy link
Sponsor Member

@stuhood stuhood Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this fn takes ownership of self, it shouldn't need to clone: it can consume self (including self.0).

EDIT: Oh. But we clone to render the error message. Hm.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the error type is String - presumably the glob which failed to parse? You might not need to clone the rest of the globs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably the glob which failed to parse? You might not need to clone the rest of the globs.

Yeah, although we want to show what self was in the error message so it's more useful. Not technically necessary, but I think it's fine because the PathGlobs object is usually pretty small (a vec of a few strings).

@stuhood thoughts?

PyValueError::new_err(format!(
"Failed to parse PathGlobs: {:?}\n\nError: {}",
self.0, e
))
})
}
}

impl<'source> FromPyObject<'source> for PyPathGlobs {
fn extract(obj: &'source PyAny) -> PyResult<Self> {
let globs: Vec<String> = obj.getattr("globs")?.extract()?;

let description_of_origin_field: String = obj.getattr("description_of_origin")?.extract()?;
let description_of_origin = if description_of_origin_field.is_empty() {
None
} else {
Some(description_of_origin_field)
};

let match_behavior_str: &str = obj
.getattr("glob_match_error_behavior")?
.getattr("value")?
.extract()?;
let match_behavior = StrictGlobMatching::create(match_behavior_str, description_of_origin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of hoping for an API closer to serde::Deserialize here, where a derive macro would to most of the field-mapping work for you, and you'd be able to write something like:

let path_globs: PathGlobs = pyo3_serde::from_object(obj)?;

rather than having to get each field by field-name and put them together yourself... That would probably necessitatedescription_of_origin being a field on glob_match_error_behavior rather than a sibling field in obj.

I guess this is PyO3/pyo3#566 and I definitely wouldn't suggest implementing it on the critical path of this conversion, but it would be nice to avoid that boilerplate... :)

Copy link
Sponsor Member

@stuhood stuhood Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth calling out that this wrapping of a Python object is an alternative to making the object native as we do for a few types in

py_class!(pub class PyDigest |py| {

I'm kindof curious what making the object native would look like with Py03's API... it might be less awkward than with rust-cpython's, and I expect that it would be more performant than double-constructing objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I didn't go with a native class because we already have examples of that and to land the port successfully I want to avoid rewriting everything to do that. But after the port is finished, it makes sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to do something like this to avoid a lot of the getattr boilerplate:

use pyo3::FromPyObject;

#[derive(FromPyObject)]
struct GlobMatchErrorBehavior<'a> {
    value: &'a str,
}

#[derive(FromPyObject)]
struct Conjunction<'a> {
    value: &'a str,
}

#[derive(FromPyObject)]
struct PyPathGlobs<'a> {
    globs: Vec<String>,
    description_of_origin: String,
    glob_match_error_behaviour: GlobMatchErrorBehavior<'a>,
    conjunction: Conjunction<'a>,
}

.map_err(PyValueError::new_err)?;

let conjunction_str: &str = obj.getattr("conjunction")?.getattr("value")?.extract()?;
let conjunction =
GlobExpansionConjunction::create(conjunction_str).map_err(PyValueError::new_err)?;

Ok(PyPathGlobs(PathGlobs::new(
globs,
match_behavior,
conjunction,
)))
}
}

#[pyfunction]
fn match_path_globs(
py_path_globs: PyPathGlobs,
paths: Vec<String>,
py: Python,
) -> PyResult<Vec<String>> {
py.allow_threads(|| {
let path_globs = py_path_globs.parse()?;
Ok(
paths
.into_iter()
.filter(|p| path_globs.matches(&Path::new(p)))
.collect(),
)
})
}
34 changes: 0 additions & 34 deletions src/rust/engine/src/externs/interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -154,11 +154,6 @@ py_module_initializer!(native_engine, |py, m| {
py_fn!(py, maybe_set_panic_handler()),
)?;

m.add(
py,
"match_path_globs",
py_fn!(py, match_path_globs(a: PyObject, b: Vec<String>)),
)?;
m.add(
py,
"write_digest",
Expand Down Expand Up @@ -1545,35 +1540,6 @@ fn lease_files_in_graph(
})
}

fn match_path_globs(
py: Python,
path_globs: PyObject,
paths: Vec<String>,
) -> CPyResult<Vec<String>> {
let matches = py
.allow_threads(|| {
let path_globs = nodes::Snapshot::lift_prepared_path_globs(&path_globs.into())?;

Ok(
paths
.into_iter()
.map(PathBuf::from)
.filter(|pb| path_globs.matches(pb.as_ref()))
.collect::<Vec<_>>(),
)
})
.map_err(|e: String| PyErr::new::<exc::ValueError, _>(py, (e,)))?;

matches
.into_iter()
.map(|pb| {
pb.into_os_string().into_string().map_err(|s| {
PyErr::new::<exc::Exception, _>(py, (format!("Could not decode {:?} as a string.", s),))
})
})
.collect::<Result<Vec<_>, _>>()
}

fn capture_snapshots(
py: Python,
scheduler_ptr: PyScheduler,
Expand Down