Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for packaging python AWS Lambda layers #19123

Merged
merged 4 commits into from
May 30, 2023

Conversation

huonw
Copy link
Contributor

@huonw huonw commented May 23, 2023

This fixes #18880 by making pants able to create a AWS Lambda package with the layout expected by a "Layer" (https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html). For Python, this the same as a normal Lambda, except within a python/: python/cowsay/__init__.py will be importable via import cowsay.

The big win here is easily separating requirements from sources:

# path/to/BUILD
python_sources()

python_awslambda(
  name="lambda",
  entry_point="./foo.py:handler",
  include_requirements=False
)
python_aws_lambda_layer(
  name="layer",
  dependencies=["./foo.py"],
  include_requirements=True,
  include_sources=False,
)

Packaging that will result in path.to/lambda.zip that contains only first-party sources, and path.to/layer.zip that contains only third-party requirements. This results in faster builds, less cache usage, and smaller deploy packages when only changing first-party sources. This side-steps some of the slowness in #19076. For the example in that PR:

metric Lambdex zip, no layers using zip + layers (this PR)
init time on cold start 2.3-2.5s 1.3-1.4s not yet tested
compressed size 24.6MB 23.8MB 28KB (lambda), 23.8MB (layer)
uncompressed size 117.8MB 115.8MB 72KB (lambda), 115.7MB (layer)
PEX-construction build time ~5s ~5s ~1.5s (lambda), ~5s (layer)
PEX-postprocessing build time 0.14s 4.8s 1s (lambda), ~5s (layer)

That is, the first-party-only lambda package ~1000× smaller, and is ~2× faster to build than even the Lambdex version.

This uses a separate target, python_aws_lambda_layer. The target has its inputs configured using the dependencies=[...] field. For instance, the example above is saying create a lambda using all of the third-party requirements required (transitively) by ./foo.py, and none of the first-party sources.

(The initial implementation just added a layout="layer" option to the existing python_awslambda target, but, per the discussion in this PR, that was deemed unnecessarily confusing, e.g. it'd keep the handler field around, which is meaningless for a layer.)

Follow-up not handled here:

  • for 2.18:
    • documentation
    • renaming the python_awslambda target to python_aws_lambda_function to be clearer (NB. I proposing also taking the chance to add an underscore in the name, which I've done with the python_aws_lambda_layer target. Let me know if there's a strong reason for the awslambda name without an underscore. I note the GCF target is python_google_cloud_function)
  • not necessarily for 2.18:
    • potentially, adding "sizzle" like the ability to have python_aws_lambda_function specify which layers it will be used with, and thus automatically exclude the contents of the layer from the function (e.g. the example above could hypothetically replace include_requirements=False with layers=[":layer"])

The commits are individually reviewable.

@huonw huonw marked this pull request as ready for review May 23, 2023 23:24
@huonw huonw requested a review from benjyw May 23, 2023 23:24
@huonw huonw changed the title Add support for Lambda layers Add support for packaging python AWS Lambda layers May 23, 2023
Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, yay! Thanks for making our Lambda support miles better.

The implementation looks great to me. In response to your question:

It does seem weird for a layer to have an entry point, and I didn't quite get your point about macros, or why in your example the layer would care about foo.py at all (assuming that is the entry point). Surely the layer doesn't know about entry points that use it?

It seems like "a lambda uses a layer" is modeled by a dep, not an entry point. And it would be great for Pants to provide lambda publishing that prevents the user from having to generate metadata that just recapitulates these dependencies.

But there are a few caveats:

  1. Order does matter for layers - they are merged in the order in which they are specified in the function's metadata, and so to create that metadata we would have to preserve the order. Does the dep field preserve order?

  2. A function can only have 5 layers (not sure if that includes the function itself).

  3. There's no transitivity - a layer can't depend on other layers.

So we'd have to do some validation if we model these as deps. The alternative is to give the lambda target a field that is a flat list of layers.

And in the mix is dep inference - is that likely to be possible or useful here?

src/python/pants/backend/awslambda/python/target_types.py Outdated Show resolved Hide resolved
@huonw
Copy link
Contributor Author

huonw commented May 26, 2023

It does seem weird for a layer to have an entry point, and I didn't quite get your point about macros, or why in your example the layer would care about foo.py at all (assuming that is the entry point). Surely the layer doesn't know about entry points that use it?

Ah, I think you're thinking about this more expansively than me: I was just thinking of Layers as just another 'leaf' package type that's independent of any particular Lambda, and it's up to users to create them and deploy them in the right way...

One way to do that would be to have macro lambda_with_layer (or whatever) that splits each given lambda into two targets:

  • a Layer for all third party deps
  • a Function for all first party source
# macros.py
def lambda_with_layer(*, name, **kwargs):
  python_awslambda(
    name=name,
    **kwargs,
    include_requirements=False
  )
  python_awslambda(
    name=f"{name}-layer",
    **kwargs,
    layout="layer",
  )
# BUILD
python_sources()

lambda_with_layer(
   name="foo-function",
   entry_point="./foo.py:handler"
)

Then, the deploy infra (Terraform or CDK or whatever) would deploy a lambda by referring to the appropriate per-Lambda Layer package and the source Lambda package (e.g. in my work repo, we'd have a CDK construct that takes the name and constructs a Layer with {name}-layer.zip and a Function with the {name}.zip, and configures the Function to use the Layer).

For this particular use case, it's convenient to be able to specify what to build for the Layer in the same way as for the Lambda, so the macro can take entry_point="./foo.py:func and pass that to both targets. That is, the layer doesn't really have an entry point, it's just saying "build a layer using this entry point as the starting point for what to include".

It seems like "a lambda uses a layer" is modeled by a dep, not an entry point. And it would be great for Pants to provide lambda publishing that prevents the user from having to generate metadata that just recapitulates these dependencies.

I wasn't thinking of the connection between layer and function as a pants level thing... and I'm not sure it fully can be, as I think pants can't/won't be inserting itself into the deploy pipeline to deploy a Layer and then add it to the appropriate Functions... As in, I'm not sure how to translate a pants-level dependency between a Function and a Layer into a deployed one, other than a human writing that logic into their deploy infra manually? (At least, not for 'interesting' deploys, e.g. a lambda as part of a broader CDK/CloudFormation stack)

But some sort of pants smarts does seem very valuable. E.g. hypothetically pants could exclude from the lambda (or other layers) any sources/requirements that have already been included in earlier layers:

# a 'core' layer that contains things used by literally every lambda
python_awslambda(
  name="core-layer",
  dependencies=["boto3", "some-logging-library"]
  layout="layer",
)

# any extra dependencies required for foo.py
python_awslambda(
   name="specific-layer",
   entry_point="./foo.py:func",
   layers=[":core-layer"], # pants-level transitive dependency, flattened into any users of :specific-layer
   layout="layer",
)

python_awslambda(
  name="lambda"
  entry_point="./foo.py:func",
  layers=[":specific-layer"], # automatically exclude boto3 or anything else in :core-layer or :specific-layer
)

(Still using the single-target and entry_point phrasing, but that can be changed.)

In any case, I feel that might be worth considering as a 'value add'/follow-up on top of the raw layer building functionality, since e.g. someone might want to be publishing layers directly for others to consume, or otherwise need fine-grained control.

But there are a few caveats:

I think, mostly, these have to be managed by users: it's up to them to arrange their layers to not need more than 6 layer (i.e. if they split it up to much, they'll have to unspilt), and similarly for putting them in the correct order.

And in the mix is dep inference - is that likely to be possible or useful here?

I'm not sure how one might infer dependencies usefully, because the breakdown of packages into Layers/Functions feels like a human-level thing, but I'm more than happy to hear ideas!

@benjyw
Copy link
Sponsor Contributor

benjyw commented May 26, 2023

Got it, yeah, fair to just have a layer be a standalone leaf thing and it's up to the user to combine layers into a function in the lambda metadata. We can defer any further cleverness to future improvements.

But I still don't see why the layer needs an entry point field. A macro can still create one, no? It would just have to slightly futz with the kwargs, which is fine. I'd be reluctant to have inaccurate modeling just for the convenience of a high-level macro.

So I think maybe a new target type does make sense (it's composed of the same fields, except for entry point, so it's not a lot of boilerplate): python_awslambda_layer, and python_awslambda_function can be an alias for python_awslambda until we deprecate that name?

@huonw
Copy link
Contributor Author

huonw commented May 26, 2023

Sounds good.

I guess a macro can take two arguments and compose them:

# macros.py
def lambda_with_layer(*, name, entry_point_file, entry_point_func, **kwargs):
  python_awslambda(
    name=name,
    entry_point="{entry_point_file}:{entry_point_func}",
    **kwargs,
    include_requirements=False
  )
  python_awslambda_layer(
    name=f"{name}-layer",
    **kwargs,
    dependencies=[entry_point_file],
  )

So the outcome of this might be:

  1. separate target for layers
  2. a layer is constructed only using inputs from (transitive) dependencies without a dedicated field for "entry point" or whatever
  3. Follow-up: rename the function target

I think this will mean the Python import path (module.path:func) style of entry point specification won't work for layers: a macro like the above will always have to use entry_point_file="./module/path.py", because dependencies=["module.path"] doesn't work (I think?). This is probably okay!

@benjyw
Copy link
Sponsor Contributor

benjyw commented May 26, 2023

I guess a macro can take two arguments and compose them:

That, or the macro can copy and modify the kwargs dict.

I think this will mean the Python import path (module.path:func) style of entry point specification won't work for layers

Now I'm confused again. Why does a layer need an entry point specification? This seems like the one thing that a function has and a layer doesn't. But in any case, the module:func should surely work?

@huonw
Copy link
Contributor Author

huonw commented May 26, 2023

Ah sorry, I meant "specifying where to start when computing the transitive set of sources/requirements", provided via the layer target's dependencies, as in the macro example.

@benjyw
Copy link
Sponsor Contributor

benjyw commented May 27, 2023

Ah sorry, I meant "specifying where to start when computing the transitive set of sources/requirements", provided via the layer target's dependencies, as in the macro example.

I think just start with all the explicit dependencies, no?

@huonw huonw force-pushed the feature/18880-lambda-layers branch 3 times, most recently from ba36803 to 87c7f1d Compare May 28, 2023 11:43
Copy link
Contributor Author

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to introduce a new python_aws_lambda_layer target. This feels better, other than the implementation being a bit non-DRY. Thanks for brainstorming.

I've rewritten the PR description to match this new state.



@dataclass(frozen=True)
class PythonAwsLambdaLayerFieldSet(PackageFieldSet):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the rules below) end up rather non-DRY, but the repetition is generally uninteresting (just splat the fieldset field into the matching request field). Let me know if there's a good way to DRY it more.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared base class for the FieldSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done that, although the more interesting duplication is constructing the BuildPythonFaaSRequest and I don't know of a good way to DRY that.

PythonAwsLambdaIncludeRequirements,
PythonAwsLambdaIncludeSources,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it didn't hurt to have a include_sources= field on python_awslambda functions in addition to layers, so this is in the shared fields. Let me know if you'd prefer it to be specific to layers.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit weird not to, I think? What would it mean to have a function without sources?

Copy link
Contributor Author

@huonw huonw May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one could have sources in a layer and only (a single?) third party dep in a layer, and the specify something in env vars to direct that dep to call into the sources...

That's pretty weird, and I don't know of any concrete example that actually does this (or why one would want the third-party dep in the function without the sources...). I've moved include_sources to just python_aws_lambda_layer to reduce the permutations/simplify the docs.

@huonw huonw requested a review from benjyw May 28, 2023 11:44
@huonw huonw force-pushed the feature/18880-lambda-layers branch from 87c7f1d to 33de0bb Compare May 28, 2023 22:58
Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

@huonw huonw merged commit 318c853 into pantsbuild:main May 30, 2023
@huonw huonw deleted the feature/18880-lambda-layers branch May 30, 2023 05:48
huonw added a commit that referenced this pull request May 30, 2023
This adjusts the AWS Lambda and Google Cloud Function documentation for
the new Zip layout, added in #19076 and targeted for 2.17.

This PR is just what's required for 2.17, ready to cherry-pick. The
"Migrating" section is written with this in mind. It will require
adjustment for 2.18 to reflect the change in defaults, and, hopefully,
support for AWS Lambda Layers (#18880, #19123).

Fixes #19067
huonw added a commit to huonw/pants that referenced this pull request May 30, 2023
This adjusts the AWS Lambda and Google Cloud Function documentation for
the new Zip layout, added in pantsbuild#19076 and targeted for 2.17.

This PR is just what's required for 2.17, ready to cherry-pick. The
"Migrating" section is written with this in mind. It will require
adjustment for 2.18 to reflect the change in defaults, and, hopefully,
support for AWS Lambda Layers (pantsbuild#18880, pantsbuild#19123).

Fixes pantsbuild#19067
huonw added a commit that referenced this pull request May 31, 2023
This adjusts the AWS Lambda and Google Cloud Function documentation for
the new Zip layout, added in #19076 and targeted for 2.17.

This PR is just what's required for 2.17, ready to cherry-pick. The
"Migrating" section is written with this in mind. It will require
adjustment for 2.18 to reflect the change in defaults, and, hopefully,
support for AWS Lambda Layers (#18880, #19123).

Fixes #19067
huonw added a commit that referenced this pull request Jun 1, 2023
This renames the `python_awslambda` target to
`python_aws_lambda_function`. This is to reduce ambiguity now that Pants
has support for building layers via `python_aws_lambda_layer` (#19123),
in addition to functions, and also aligns with the
`python_google_cloud_function` target too.

This change has a single release deprecation since it's easy to fix, and
keeps it inline with the layout switch over (e.g. #19122), but it's also
easy to keep around for longer.

This PR essentially just does a search and replace on the documentation,
it doesn't update the docs to include info about layers. I'm working on
that separately.
huonw added a commit that referenced this pull request Jun 1, 2023
This updates the AWS Lambda and Google Cloud Functions documentation to
be appropriate for Pants 2.18, which includes:

- updates for the migration from Lambdex to 'native'/zip (continuing
#19067 and #19122)
- describing how to build a Lambda layer (documenting #19123)
- updating the AWS Lambda docs for renaming from `python_awslambda` to
`python_aws_lambda_function` (finishing off #19216)
@huonw
Copy link
Contributor Author

huonw commented Jun 6, 2023

We can defer any further cleverness to future improvements

Issue tracking one possible improvement: #19256

huonw added a commit that referenced this pull request Aug 6, 2023
This ensures the `python_aws_lambda_layer` target type exists when
activating the `pants.backend.awslambda.python` backend. This was missed
from #19123, oops!
huonw added a commit that referenced this pull request Aug 8, 2023
In yet more follow up to #19123 and #19550 for #18880, this ensures the
`PythonAwsLambdaLayerFieldSet` is part of the `PackageFieldSet` union,
so that `pants package path/to:some-layer` actually works.

I clearly didn't test this properly in #19123 or #19550, but now I have:
in a separate repo `PANTS_SOURCE=~/... pants package path/to:some-layer`
produces `dist/path.to/some-layer.zip`, with the expected contents. 🎉
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow building AWS Lambda Layers
2 participants