Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Poetry macros to generate Python requirements and constraints #10655

Closed
Eric-Arellano opened this issue Aug 19, 2020 · 16 comments · Fixed by #12174
Closed

Add Poetry macros to generate Python requirements and constraints #10655

Eric-Arellano opened this issue Aug 19, 2020 · 16 comments · Fixed by #12174
Assignees

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Aug 19, 2020

Some orgs use Poetry for dependency management because it has an excellent resolver and lockfile support. While they can translate from Poetry into the pip/Pex world of requirements.txt and constraints.txt, this requires running poetry export. Pants can automate some of that via macros, similar to the python_requirements() macro converting requirements.txt into python_requirement_library targets.

We want to distinguish between the high-level pyproject.toml, where the dependencies you want are declared, vs. poetry.lock, where all transitive deps are used. The former should be used to generate python_requirement_library targets, which get used for things like dependency inference. The latter should be used for a constraints file.

Strawman design:

# BUILD
poetry_requirements(
  pyproject="pyproject.toml",   # the default
  lock="poetry.lock",  # the default
  module_mapping={"my_dist": ["module1"]},   # if you want augment Pants's default module mapping
)

Relates to #11710 for better Pipenv support.

@Eric-Arellano Eric-Arellano added this to the 2.0.0rc0 milestone Aug 19, 2020
Eric-Arellano pushed a commit that referenced this issue Aug 22, 2020
Part of #10655.

This macro will read a `Pipfile.lock` and convert everything into a `python_requirement_library`. It won't actually treat this as a true lockfile, i.e. it won't feed the results into the option `--python-setup-requirements-constraints`. That will need to come in a followup. Rather, this acts like the macro `python_requirements()`, just with a Pipfile.lock.
@stuhood
Copy link
Sponsor Member

stuhood commented Sep 9, 2020

@jsirois : This might relate to some of what you're looking at: would be good to sync with Eric on it when you get back.

@Eric-Arellano Eric-Arellano changed the title Support Poetry and Pipenv Support from poetry.lock and pipenv.lock Sep 24, 2020
@Eric-Arellano Eric-Arellano changed the title Support from poetry.lock and pipenv.lock Support poetry.lock and pipenv.lock Sep 24, 2020
@Eric-Arellano
Copy link
Contributor Author

This is a great feature to have, but not going to make it into 2.0. While this workaround isn't ergonomic, Poetry and Pipenv users can in the meantime use the export commands to export into requirements.txt and constraints.txt files.

We can pick this back up for 2.1 or 2.2, as our focus for the first 2.x releases is futher improving Python, before we venture out to new languages.

@blarghmatey
Copy link

Another useful capability if integrating with Poetry and Pipenv is to allow for propagating the constraint ranges to the generated setup.py files when packaging a project. So, rather than setting the install_requires = {'foo==1.2.3'} when Poetry has foo = "^1.0.0", it would generate something like install_requires = {'foo>=1.0.0,<2'}.

@lucinvitae
Copy link

Hi @stuhood (long time no see!), my company is evaluating pants as a build tool for some of our python-based ML repositories, which currently use Poetry for package/dependency management.

Is there any way this feature will make it into any of the upcoming Pants 2.X releases? And - especially if that answer is no - would having a few devs like us collaborate on the feature help improve its chances of getting released?

Thanks in advance, and let us know how we can help.

@stuhood
Copy link
Sponsor Member

stuhood commented Mar 15, 2021

Hey @lucinvitae! Good to hear from you!

Is there any way this feature will make it into any of the upcoming Pants 2.X releases?

It is not currently prioritized: we updated on the roadmap recently over here: https://groups.google.com/g/pants-devel/c/F8Saug3BrFw/m/ZtWDP4YmDwAJ

And - especially if that answer is no - would having a few devs like us collaborate on the feature help improve its chances of getting released?

We would love to collaborate with you on this one!

As you've seen, Pants' existing support for lockfiles is relatively limited:

  1. they are manually generated
  2. only one is supported per repository
  3. only constraints.txt is supported

We have work scheduled to resolve the first two issues in the next month or two: #11165 ... but the last issue is what this ticket deals with.


There are a few very different designs for this feature, and @Eric-Arellano has offered to help sketch those out in the next few days: if you have any thoughts on how you'd like the integration to work, they'd be very helpful in the meantime!

@Eric-Arellano Eric-Arellano changed the title Support poetry.lock and pipenv.lock Add Poetry macros to generate Python requirements and constraints Mar 16, 2021
@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Mar 16, 2021

Hey @lucinvitae, awesome to hear! As Stu said, we'd love the help and are eager to assist to pull this off.

--

The first step is checking that we're on the same page for what Poetry support entails. I've updated this PR's title and description to focus on Poetry, rather than including Pipenv. Could you please check that that's what you're envisioning?

Note that Pants will still use Pex to do the actual resolve, which uses pip under the hood. All this PR is doing is automating you as an admin having to run poetry export to generate a requirements.txt and constraints.txt file.

Likewise, this is a lossy conversion. A Poetry lockfile says specifically which file to download, e.g.:

 {file = "psutil-5.7.0-cp37-cp37m-win32.whl", hash = "sha256:d008ddc00c6906ec80040d26dc2d3e3962109e40ad07fd8a12d0284ce5e0e4f8"},

Instead, we will only be able to capture this information:

[[package]]
name = "psutil"
version = "5.7.0"
description = "Cross-platform lib for process and system monitoring in Python."
category = "main"
optional = false
python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"

For example, we would likely convert that into psutil==5.7.0 in a constraints.txt file. Is this lossiness okay with what you're looking for?

Instructions

We will be creating a macro, very similar to the python_requirements() and pipenv_requirements() macros. Both do similar things:

  1. In __call__, take some args that will get used in the BUILD file.
  2. Create a _python_requirements_file(), which is a somewhat hacky way that we ensure correct caching. All the targets you're creating will depend on this, so when pyproject.toml and poetry.lock change, we know to invalidate those targets. The target type is a bit of a misnomer, but it's fine to use here.
  3. Use Path().read_text() to read the file, then parse it. You want back pkg_resources.Requirement objects.
    • The parsing will probably be the trickiest part. We do have access to the toml library. It'd be valuable to see prior art from other tools, including how Poetry itself consumes this file. Definitely do feel free to talk through you're thinking on parsing before getting too far into coding - it's the biggest unknown imo.
  4. Some module_mapping code which can be copy and pasted.
  5. Convert each Requirement object into a python_requirement_library target.

I recommend modeling this as a single macro, something like poetry_project() from the PR description.

Unlike those two macros, we will be creating two types of targets: a python_requirement_library target for each entry in pyproject.toml, which is what gets used for things like dep inference; and a single _python_constraints() target representing the entirety of poetry.lock.

We should also create two _python_requirements_file() targets, one for the pyproject.toml and another for poetry.lock. Set the python_requirement_library targets to depend on the former, and _python_constraints to depend on the latter.

Then, add tests similar to https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/python/macros/python_requirements_test.py.

@Eric-Arellano
Copy link
Contributor Author

@lucinvitae FYI the prework has landed with #11724, so you should now be fully unblocked. I simplified the above instructions based on that landing :)

@cognifloyd
Copy link
Member

cognifloyd commented Mar 29, 2021

Two thoughts:

  1. A plugin system was just merged into Poetry (Implement a plugin system python-poetry/poetry#3733). A plugin could probably be added to poetry that adds relevant stuff (whatever that looks like - I'm too new to say) in the BUILD files at the same time it is adjusting config in pyproject.toml.

  2. Poetry is also written in python, so you could actually just reuse poetry's pyproject.toml / poetry.lock parsing logic. It looks like that is in the poetry-core library (thanks to OdobenusMaximus in the poetry discourse channel for pointing that out to me).

@Eric-Arellano
Copy link
Contributor Author

Thanks @cognifloyd! #2 is particularly compelling, that sounds great and you can add the library to 3rdparty/python/requirements.txt.

#1 looks interesting, although is likely out of scope. It's much trickier to teach Poetry about Pants imo than to teach Pants how to read Poetry files via poetry-core.

@lucinvitae
Copy link

@Eric-Arellano and @stuhood, thanks for the replies and detailed instructions. Replies/comments below:

We will be creating a macro, very similar to the python_requirements() and pipenv_requirements() macros.

e.g. poetry_requirements()?

Is this lossiness okay with what you're looking for?

I'm not sure yet. The "monorepo" approach requires more investigation as to whether or not it suits our internal project's needs. We're just now adding our first source-level dependencies to the internal project. Whether we lose the "hash"-level of detail for the project may not be a huge concern, but we'll see.

As for parsing:

The parsing will probably be the trickiest part. We do have access to the toml library. It'd be valuable to see prior art from other tools, including how Poetry itself consumes this file. Definitely do feel free to talk through you're thinking on parsing before getting too far into coding - it's the biggest unknown imo.

It looks like poetry-core is vendoring its dependencies here, which appears to be a subject of some contention, as seen in python-poetry/poetry#2346. One vendored dependency includes tomlkit, which is used as a base library for parsing the pyproject.toml file (here). The lockfile parsing logic is as of this comment defined in the main poetry repo instead. Currently the class that includes the primary lockfile parsing logic is Locker. Some notes, mostly for myself:

  • Package parsing appears to be handled by the Package class.
  • Dependency and version parsing is mostly handled by Dependency (supports PEP 508 dep declarations).
  • Locker.locked_repository() does the heavy lifting, parsing the packages for each entry in the toml file with the section [[package]] key, as well as dependencies for each package.

There's some discussion to bring poetry.lock support into poetry-core here: python-poetry/poetry-core#83.

@Eric-Arellano
Copy link
Contributor Author

Thanks for all that research @lucinvitae! I think it's a non-starter for us to depend on Poetry itself, and we do need to parse poetry.lock to get the constraints file. So, sounds like we need to do our own TOML reading one way or another. At that point, I think we can leave off poetry-core. The spec looks pretty simple to consume for both pyproject.toml and poetry.lock fwict, and we can of course always look to Poetry's code for how they do it. What do you think?

Is this lossiness okay with what you're looking for?

@jsirois mentioned it may be possible to preserve the hashes with Pip and Pex. I think it's out of scope for this ticket, but a possible followup.

e.g. poetry_requirements()?

I recommend calling it poetry_project(), where it's looking both at the top-level requirements and the lockfile for constraints.

--

We've had a couple more users ask for this Poetry integration in the past two weeks, so it's becoming a higher priority for the project and I'd love to help land this.

I'm wondering, how would it sound for you to focus on writing some generic code that reads poetry.lock and/or pyproject.toml, and then I can focus on wiring that up to Pants via the macro? Or perhaps if there are other ways I could help, like focusing on one of the two files while you look at the other? I'm happy to help however :)

@jsirois
Copy link
Contributor

jsirois commented Apr 1, 2021

@jsirois mentioned it may be possible to preserve the hashes with Pip and Pex. I think it's out of scope for this ticket, but a possible followup.

It works straight out of the gate in Pex, so if Pants doesn't get in the way and you're already generating a requirements.txt as part of this work, you could just add in hash checking from the get go as shown here: pex-tool/pex#1310

@lucinvitae
Copy link

@Eric-Arellano to answer your question about consuming the lockfile:

I think using custom logic instead of adding a dependency on poetry from pants sounds reasonable. And poetry_project() seems like a decent name.

Regarding tackling the implementation:

I'm wondering, how would it sound for you to focus on writing some generic code that reads poetry.lock and/or pyproject.toml, and then I can focus on wiring that up to Pants via the macro? Or perhaps if there are other ways I could help, like focusing on one of the two files while you look at the other? I'm happy to help however :)

My team is still evaluating a monorepo architecture and hasn't committed to using Pants, so I probably can't commit to delivering this feature by a certain date. If you're comfortable with potentially a long wait, I think your proposal that I write some generic lockfile code would work. Or if - due to increased priority from users requesting this feature - you want to self-assign this issue, please do!

@Eric-Arellano
Copy link
Contributor Author

I think using custom logic instead of adding a dependency on poetry from pants sounds reasonable.

Great!

so I probably can't commit to delivering this feature by a certain date. If you're comfortable with potentially a long wait, I think your proposal that I write some generic lockfile code would work. Or if - due to increased priority from users requesting this feature - you want to self-assign this issue, please do!

Okay, sounds good. I'll try to land this soon, hopefully before y'all make a decision so you can see how the integration would work out for you :)

My team is still evaluating a monorepo architecture and hasn't committed to using Pants

Please feel free to let us know if there's anything we can help with or you have any questions! Folks are quite friendly on Slack and sometimes share their experience with monorepos, for example. https://www.pantsbuild.org/docs/getting-help#slack

@Eric-Arellano Eric-Arellano self-assigned this Apr 6, 2021
@wilsonliam
Copy link
Contributor

PR #12174 adds parsing for a Poetry-style pyproject.toml file...it's more or less complete; still welcoming input, though. Estimating ~1 day for code review before landing.

@stuhood
Copy link
Sponsor Member

stuhood commented Jun 17, 2021

Sounds great! Maybe also some time for documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants