Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement layout="zip" for Lambda/GCF, skipping lambdex #19022

Merged
merged 10 commits into from
May 20, 2023

Conversation

huonw
Copy link
Contributor

@huonw huonw commented May 17, 2023

This fixes #18879 by allowing the python_awslambda and python_google_cloud_function FaaS artefacts to be generated in "simple" format, using the pex3 venv create --layout=flat-zipped functionality recently added in PEX 2.1.135 (https://github.com/pantsbuild/pex/releases/tag/v2.1.135). This format is just: put everything at the top-level, e.g. the zip contains cowsay/__init__.py etc., rather than .deps/cowsay-....whl (plus the dynamic PEX initialisation).

This shifts the dynamic dependency computation/extraction/layout from run-time to build-time, relying on the FaaS environment to be generally consistent. It shouldn't change what actually happens after initialisation. This can:

  • reduce cold-starts noticeably: for instance, some of our lambdas spend 1s doing PEX/Lambdex start up.
  • reduce package size somewhat (the PEX .bootstrap/ folder seems to be about 2MB uncompressed, ~1MB compressed).
  • increase build times.

For instance, for one Python 3.9 Lambda in our codebase:

metric before after
init time on cold start 2.3-2.5s 1.3-1.4s (-1s)
compressed size 24.6MB 23.8MB (-0.8MB)
uncompressed size 117.8MB 115.8MB (-2.0MB)
PEX-construction build time ~5s ~5s
PEX-postprocessing build time 0.14s 4.8s

(The PEX-postprocessing time metric is specifically the time to run the Setting up handler (lambdex) or Build python_awslambda (pex3 venv create) process, computed by running pants --keep-sandboxes=always package ... for each layout, and then hyperfine -r3 -w1 path/to/first/__run.sh path/to/second/__run.sh. This doesn't include the time to construct the input PEX, which is the same for both.)

This functionality is driven by adding a new layout field. It defaults to lambdex (retaining the current code paths), but also supports zip, which keys into the functionality above. I've tried to keep the non-lambdex implementation generally separate to the lambdex one, rather than reusing all of the code that happens to be common currently, because it'd make sense to deprecate/remove the lambdex functionality and thus I feel it's best for this new functionality to be mostly a fresh start.

This PR's commits can be reviewed independently. It comes in three phases:

  1. Add the pex_venv.py util rules for running pex3 venv create .... Currently this only supports a limited subset of the functionality there, but can presumably be expanded freely as required. (First commit)
  2. Do some minor refactoring. (Commits labelled "refactor: ...")
  3. Draw the rest of the owl. (The others.)

I think this is an acceptable MVP for this functionality, but there's various bits of follow-up:

@huonw huonw requested a review from benjyw May 17, 2023 11:37


@dataclass(frozen=True)
class PexVenv:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this name will be confusing with VenvPex, but I lost inspiration, let me know if you have a better name.

Comment on lines +499 to +503
pex_request = PexFromTargetsRequest(
addresses=[request.address],
internal_only=False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unnecessarily heavyweight, and having to materialise the full non-internal PEX just to throw it away is presumably a major reason why this new approach is slower (since I imagine there's duplicated work between "build a PEX" and "turn that PEX into a venv"), related to the discussion in the PR.

However, this needs to be built for a potentially-foreign platform, and hence, AIUI, internal_only=True and VenvPex don't work.

@benjyw
Copy link
Sponsor Contributor

benjyw commented May 17, 2023

Hooray! After this I think we should deprecate lambdex and get rid of it. Will review asap.

# The AWS-facing handler function is always lambdex_handler.handler, which is the
# wrapper injected by lambdex that manages invocation of the actual handler.
handler_log_message="lambdex_handler.handler",
# this isn't built or run via lambdex, but the name is arbitrary, so we stick with the
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good opportunity to get rid of this name, since we're churning anyway. It'll be harder to do later.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, are we sure we want this at all? Why not use the handler name the user configured directly?

Copy link
Contributor Author

@huonw huonw May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, are we sure we want this at all? Why not use the handler name the user configured directly?

Yes, I think so. The BUILD file target must contain the entry point (at least the file/module part of entry_point="path/to.py:handler"), in order to compute dependencies/build the package, and it's annoying to have to carefully reproduce that elsewhere (handler: 'parent.modules.path.to.handler' in templates/IaC code). Synthesising a fixed module avoids that, and is all info pants already knows.

For instance, in our repo, we use AWS CDK, and have a generic PythonLambda construct where the basic "get the code running" config is just pointing it to the package (e.g. foo.bar/target.zip), because it can just set handler: 'lambdex_handler.handler' globally, not requiring customisation per lambda. I think this is really nice!

(There's obviously often per-lambda config required like any extra env vars or permissions, but CDK/the infra configuration is the source of truth there, it's not just directly duplicated from the BUILD files.)

I think this is a good opportunity to get rid of this name, since we're churning anyway. It'll be harder to do later.

Sure, I'm happy to choose another name, e.g. some options:

I think we can also make this customisable in future (including the ability to turn it off), e.g. synthetic_handler_module="my_custom_name" or synthetic_handler_module=None, if required/desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone with lambda_function, but more than happy to change. (Thought of another option too: main to be consistent with GCF.)

src/python/pants/backend/python/util_rules/faas.py Outdated Show resolved Hide resolved
src/python/pants/backend/python/util_rules/faas.py Outdated Show resolved Hide resolved
@huonw huonw merged commit 2bc078b into pantsbuild:main May 20, 2023
@huonw huonw deleted the feature/18879-lambda-layout branch May 20, 2023 02:07
huonw added a commit that referenced this pull request May 20, 2023
huonw added a commit that referenced this pull request May 21, 2023
)

Reverts #19022.

Per
#19032 (comment),
we're going to change the approach for moving away from Lambdex to a
global setting, rather than the per-target `layout` field added in
#19022. That change will simplify things, and involve an extra moving
part that will be cherry-picked to 2.17, so I think it's better to start
from scratch, and then more carefully cherry-pick the parts of #19022
that are relevant.
thejcannon pushed a commit that referenced this pull request May 23, 2023
This fixes #18879 by allowing the `python_awslambda` and
`python_google_cloud_function` FaaS artefacts to be generated in
"simple" format, using the `pex3 venv create --layout=flat-zipped`
functionality recently added in PEX 2.1.135
(https://github.com/pantsbuild/pex/releases/tag/v2.1.135).

This format is just: put everything at the top-level. For instance, the
zip contains `cowsay/__init__.py` etc., rather than
`.deps/cowsay-....whl`. This avoids the need to do the dynamic PEX
initialisation/venv creation.

This shifts the dynamic dependency computation/extraction/layout from
run-time to build-time, relying on the FaaS environment to be generally
consistent. It shouldn't change what actually happens after
initialisation. This can:

- reduce cold-starts noticeably: for instance, some of our lambdas spend
1s doing PEX/Lambdex start up.
- reduce package size somewhat (the PEX `.bootstrap/` folder seems to be
about 2MB uncompressed, ~1MB compressed).
- increase build times.
 
For instance, for one Python 3.9 Lambda in our codebase:

| metric | before | after |
|---|---|---|
| init time on cold start | 2.3-2.5s | 1.3-1.4s (-1s) |
| compressed size |  24.6MB | 23.8MB (-0.8MB) |
| uncompressed size | 117.8MB | 115.8MB (-2.0MB) |
| PEX-construction build time | ~5s | ~5s |
| PEX-postprocessing build time | 0.14s | 4.8s |

(The PEX-postprocessing time metric is specifically the time to run the
`Setting up handler` (lambdex) or `Build python_awslambda` (`pex3 venv
create`) process, computed by running `pants --keep-sandboxes=always
package ...` for each layout, and then `hyperfine -r3 -w1
path/to/first/__run.sh path/to/second/__run.sh`. This _doesn't_ include
the time to construct the input PEX, which is the same for both.)

---

This functionality is driven by adding a new option to the
`[lambdex].layout` option added in #19074. In #19074 (targeted for
2.17), it defaults `lambdex` (retaining the current code paths). This PR
flips the default to the new option `zip`, which keys into the
functionality above. I've tried to keep the non-lambdex implementation
generally separate to the lambdex one, rather than reusing all of the
code that happens to be common currently, because it'd make sense to
deprecate/remove the lambdex functionality and thus I feel it's best for
this new functionality to be mostly a fresh start.

This PR's commits can be reviewed independently. 

I _think_ this is an acceptable MVP for this functionality, but there's
various bits of follow-up:

- add a warning about `files` being loaded into these packages, which
has been temporarily lost (#19027)
- adjust documentation #19067
- other improvements like #18195 and #18880 
- improve performance, e.g. potentially `pex3 venv create ...` could use
the lock file and sources to directly compute the appropriate files,
without having to materialise a normal pex first

This is a re-doing of #19022 with a simpler approach to deprecation, as
discussed in
#19074 (comment)
and
#19032 (comment).
The phasing will be:

| release | supports lambdex? | supports zip? | default layout |
deprecation warnings |
|---|---|---|---|---|
| 2.17 (this PR) | ✅ | ✅ | lambdex | if `layout = "lambdex"` is
implicit, tell people to set it: recommend `zip`, but allow `lambdex` if
they have to |
| 2.18 | ✅ | ✅ | zip | if `layout = "lambdex"` is set at all, tell
people to remove it and switch to `zip` |
| 2.19 | ❌ | ✅ | zip | none, migration over (or maybe just about
removing the `[lambdex]` section entirely) |
WorkerPants pushed a commit that referenced this pull request May 23, 2023
This fixes #18879 by allowing the `python_awslambda` and
`python_google_cloud_function` FaaS artefacts to be generated in
"simple" format, using the `pex3 venv create --layout=flat-zipped`
functionality recently added in PEX 2.1.135
(https://github.com/pantsbuild/pex/releases/tag/v2.1.135).

This format is just: put everything at the top-level. For instance, the
zip contains `cowsay/__init__.py` etc., rather than
`.deps/cowsay-....whl`. This avoids the need to do the dynamic PEX
initialisation/venv creation.

This shifts the dynamic dependency computation/extraction/layout from
run-time to build-time, relying on the FaaS environment to be generally
consistent. It shouldn't change what actually happens after
initialisation. This can:

- reduce cold-starts noticeably: for instance, some of our lambdas spend
1s doing PEX/Lambdex start up.
- reduce package size somewhat (the PEX `.bootstrap/` folder seems to be
about 2MB uncompressed, ~1MB compressed).
- increase build times.
 
For instance, for one Python 3.9 Lambda in our codebase:

| metric | before | after |
|---|---|---|
| init time on cold start | 2.3-2.5s | 1.3-1.4s (-1s) |
| compressed size |  24.6MB | 23.8MB (-0.8MB) |
| uncompressed size | 117.8MB | 115.8MB (-2.0MB) |
| PEX-construction build time | ~5s | ~5s |
| PEX-postprocessing build time | 0.14s | 4.8s |

(The PEX-postprocessing time metric is specifically the time to run the
`Setting up handler` (lambdex) or `Build python_awslambda` (`pex3 venv
create`) process, computed by running `pants --keep-sandboxes=always
package ...` for each layout, and then `hyperfine -r3 -w1
path/to/first/__run.sh path/to/second/__run.sh`. This _doesn't_ include
the time to construct the input PEX, which is the same for both.)

---

This functionality is driven by adding a new option to the
`[lambdex].layout` option added in #19074. In #19074 (targeted for
2.17), it defaults `lambdex` (retaining the current code paths). This PR
flips the default to the new option `zip`, which keys into the
functionality above. I've tried to keep the non-lambdex implementation
generally separate to the lambdex one, rather than reusing all of the
code that happens to be common currently, because it'd make sense to
deprecate/remove the lambdex functionality and thus I feel it's best for
this new functionality to be mostly a fresh start.

This PR's commits can be reviewed independently. 

I _think_ this is an acceptable MVP for this functionality, but there's
various bits of follow-up:

- add a warning about `files` being loaded into these packages, which
has been temporarily lost (#19027)
- adjust documentation #19067
- other improvements like #18195 and #18880 
- improve performance, e.g. potentially `pex3 venv create ...` could use
the lock file and sources to directly compute the appropriate files,
without having to materialise a normal pex first

This is a re-doing of #19022 with a simpler approach to deprecation, as
discussed in
#19074 (comment)
and
#19032 (comment).
The phasing will be:

| release | supports lambdex? | supports zip? | default layout |
deprecation warnings |
|---|---|---|---|---|
| 2.17 (this PR) | ✅ | ✅ | lambdex | if `layout = "lambdex"` is
implicit, tell people to set it: recommend `zip`, but allow `lambdex` if
they have to |
| 2.18 | ✅ | ✅ | zip | if `layout = "lambdex"` is set at all, tell
people to remove it and switch to `zip` |
| 2.19 | ❌ | ✅ | zip | none, migration over (or maybe just about
removing the `[lambdex]` section entirely) |
huonw added a commit to huonw/pants that referenced this pull request May 23, 2023
…d#19076)

This fixes pantsbuild#18879 by allowing the `python_awslambda` and
`python_google_cloud_function` FaaS artefacts to be generated in
"simple" format, using the `pex3 venv create --layout=flat-zipped`
functionality recently added in PEX 2.1.135
(https://github.com/pantsbuild/pex/releases/tag/v2.1.135).

This format is just: put everything at the top-level. For instance, the
zip contains `cowsay/__init__.py` etc., rather than
`.deps/cowsay-....whl`. This avoids the need to do the dynamic PEX
initialisation/venv creation.

This shifts the dynamic dependency computation/extraction/layout from
run-time to build-time, relying on the FaaS environment to be generally
consistent. It shouldn't change what actually happens after
initialisation. This can:

- reduce cold-starts noticeably: for instance, some of our lambdas spend
1s doing PEX/Lambdex start up.
- reduce package size somewhat (the PEX `.bootstrap/` folder seems to be
about 2MB uncompressed, ~1MB compressed).
- increase build times.
 
For instance, for one Python 3.9 Lambda in our codebase:

| metric | before | after |
|---|---|---|
| init time on cold start | 2.3-2.5s | 1.3-1.4s (-1s) |
| compressed size |  24.6MB | 23.8MB (-0.8MB) |
| uncompressed size | 117.8MB | 115.8MB (-2.0MB) |
| PEX-construction build time | ~5s | ~5s |
| PEX-postprocessing build time | 0.14s | 4.8s |

(The PEX-postprocessing time metric is specifically the time to run the
`Setting up handler` (lambdex) or `Build python_awslambda` (`pex3 venv
create`) process, computed by running `pants --keep-sandboxes=always
package ...` for each layout, and then `hyperfine -r3 -w1
path/to/first/__run.sh path/to/second/__run.sh`. This _doesn't_ include
the time to construct the input PEX, which is the same for both.)

---

This functionality is driven by adding a new option to the
`[lambdex].layout` option added in pantsbuild#19074. In pantsbuild#19074 (targeted for
2.17), it defaults `lambdex` (retaining the current code paths). This PR
flips the default to the new option `zip`, which keys into the
functionality above. I've tried to keep the non-lambdex implementation
generally separate to the lambdex one, rather than reusing all of the
code that happens to be common currently, because it'd make sense to
deprecate/remove the lambdex functionality and thus I feel it's best for
this new functionality to be mostly a fresh start.

This PR's commits can be reviewed independently. 

I _think_ this is an acceptable MVP for this functionality, but there's
various bits of follow-up:

- add a warning about `files` being loaded into these packages, which
has been temporarily lost (pantsbuild#19027)
- adjust documentation pantsbuild#19067
- other improvements like pantsbuild#18195 and pantsbuild#18880 
- improve performance, e.g. potentially `pex3 venv create ...` could use
the lock file and sources to directly compute the appropriate files,
without having to materialise a normal pex first

This is a re-doing of pantsbuild#19022 with a simpler approach to deprecation, as
discussed in
pantsbuild#19074 (comment)
and
pantsbuild#19032 (comment).
The phasing will be:

| release | supports lambdex? | supports zip? | default layout |
deprecation warnings |
|---|---|---|---|---|
| 2.17 (this PR) | ✅ | ✅ | lambdex | if `layout = "lambdex"` is
implicit, tell people to set it: recommend `zip`, but allow `lambdex` if
they have to |
| 2.18 | ✅ | ✅ | zip | if `layout = "lambdex"` is set at all, tell
people to remove it and switch to `zip` |
| 2.19 | ❌ | ✅ | zip | none, migration over (or maybe just about
removing the `[lambdex]` section entirely) |
huonw added a commit that referenced this pull request May 23, 2023
…ck of #19076) (#19120)

This fixes #18879 by allowing the `python_awslambda` and
`python_google_cloud_function` FaaS artefacts to be generated in
"simple" format, using the `pex3 venv create --layout=flat-zipped`
functionality recently added in PEX 2.1.135
(https://github.com/pantsbuild/pex/releases/tag/v2.1.135).

This format is just: put everything at the top-level. For instance, the
zip contains `cowsay/__init__.py` etc., rather than
`.deps/cowsay-....whl`. This avoids the need to do the dynamic PEX
initialisation/venv creation.

This shifts the dynamic dependency computation/extraction/layout from
run-time to build-time, relying on the FaaS environment to be generally
consistent. It shouldn't change what actually happens after
initialisation. This can:

- reduce cold-starts noticeably: for instance, some of our lambdas spend
1s doing PEX/Lambdex start up.
- reduce package size somewhat (the PEX `.bootstrap/` folder seems to be
about 2MB uncompressed, ~1MB compressed).
- increase build times.
 
For instance, for one Python 3.9 Lambda in our codebase:

| metric | before | after |
|---|---|---|
| init time on cold start | 2.3-2.5s | 1.3-1.4s (-1s) |
| compressed size |  24.6MB | 23.8MB (-0.8MB) |
| uncompressed size | 117.8MB | 115.8MB (-2.0MB) |
| PEX-construction build time | ~5s | ~5s |
| PEX-postprocessing build time | 0.14s | 4.8s |

(The PEX-postprocessing time metric is specifically the time to run the
`Setting up handler` (lambdex) or `Build python_awslambda` (`pex3 venv
create`) process, computed by running `pants --keep-sandboxes=always
package ...` for each layout, and then `hyperfine -r3 -w1
path/to/first/__run.sh path/to/second/__run.sh`. This _doesn't_ include
the time to construct the input PEX, which is the same for both.)

---

This functionality is driven by adding a new option to the
`[lambdex].layout` option added in #19074. In #19074 (targeted for
2.17), it defaults `lambdex` (retaining the current code paths). This PR
flips the default to the new option `zip`, which keys into the
functionality above. I've tried to keep the non-lambdex implementation
generally separate to the lambdex one, rather than reusing all of the
code that happens to be common currently, because it'd make sense to
deprecate/remove the lambdex functionality and thus I feel it's best for
this new functionality to be mostly a fresh start.

This PR's commits can be reviewed independently. 

I _think_ this is an acceptable MVP for this functionality, but there's
various bits of follow-up:

- add a warning about `files` being loaded into these packages, which
has been temporarily lost (#19027)
- adjust documentation #19067
- other improvements like #18195 and #18880 
- improve performance, e.g. potentially `pex3 venv create ...` could use
the lock file and sources to directly compute the appropriate files,
without having to materialise a normal pex first

This is a re-doing of #19022 with a simpler approach to deprecation, as
discussed in
#19074 (comment)
and
#19032 (comment).
The phasing will be:

| release | supports lambdex? | supports zip? | default layout | deprecation warnings |
|---|---|---|---|---|
| 2.17 (this PR) | ✅ | ✅ | lambdex | if `layout = "lambdex"` is implicit, tell people to set it: recommend `zip`, but allow `lambdex` if they have to |
| 2.18 | ✅ | ✅ | zip | if `layout = "lambdex"` is set at all, tell people to remove it and switch to `zip` |
| 2.19 | ❌ | ✅ | zip | none, migration over (or maybe just about removing the `[lambdex]` section entirely) |
cognifloyd pushed a commit that referenced this pull request May 25, 2023
There's several places that check whether a PEX transitively depends on
any `files()` targets, because those are (currently) not included in the
PEX. This moves the warning into the "pex from targets" builder, to not
have to reimplement it everywhere.

This is follow up to #19022, and restores the warning for
`layout="zip"`, since I didn't implement it there.
github-actions bot pushed a commit that referenced this pull request May 25, 2023
There's several places that check whether a PEX transitively depends on
any `files()` targets, because those are (currently) not included in the
PEX. This moves the warning into the "pex from targets" builder, to not
have to reimplement it everywhere.

This is follow up to #19022, and restores the warning for
`layout="zip"`, since I didn't implement it there.
cognifloyd pushed a commit that referenced this pull request May 25, 2023
…19148)

There's several places that check whether a PEX transitively depends on
any `files()` targets, because those are (currently) not included in the
PEX. This moves the warning into the "pex from targets" builder, to not
have to reimplement it everywhere.

This is follow up to #19022, and restores the warning for
`layout="zip"`, since I didn't implement it there.

Co-authored-by: Huon Wilson <huon@exoflare.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Construct Python serverless artefacts (AWS Lambda etc.) in expected layout, without PEX
2 participants