Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Proposal to modify provider meta to enable specifying the region at each resource #31517

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

brittandeyoung
Copy link
Collaborator

Description

The pains around multi-region deployments in terraform is an area where I feel a high impact could be made by improving this functionality. Currently today, you have to specify a separate provider per region, use provider aliases and define the same resource at least once per region you want to deploy it to (repeat code). I have been thinking about this issue from two sides.

This Pull request is a Proof of Concept with modifications to the core provider meta, tags_interceptor, and two resources to show functionality. These modifications will enable for resources to be updated to allow specifying the region at each resource level, with a default region when one is not specified.

This is accomplished by instead of having a single AWSClient inside of the provider meta, a map containing all regions and their AWSClient. This allows either defaulting to the region provided in the provider configuration, or allowing a region to be specified at the resource level to determine the region it will be deployed and managed in.

The map of regions can be limited by providing a list to the newly added allowed_regions provider setting. eg allowed_regions=["us-east-1", "us-east-2" ] . If this setting is not provided a client will be established for all available regions for the account.

This will accomplish the following:

  1. Enable each resource to be modified to allow region to be specified at each resource and even allow for the same resource to be deployed to multiple regions using a for_each.
  2. Allow current behavior of defaulting to the provider configured region as to not introduce a breaking change.
  3. Allow for deploying to multiple regions with only the single default aws provider configuration.

This PR only has modified two resources for testing so far.

  1. aws_lightsail_bucket - Has been modified to allow passing the region at the resource level and tests have been added to test that this works as intended when passing in a region other than the default.
  2. aws_lightsail_disk - Has been modified to work with the new provider meta without adding support for region at the resource level, this allows testing and verifying that we can introduce this and still keep existing functionality for all resources and roll out the region at each resource level at its own pace.

I am not an expert on how the provider meta is configured so I am sure optimization could be made, but I feel a solution similar to this would provide the best end user experience when deploying to multiple regions. I am sure there may be more modifications needed before this proposal would be ready for merge. Additionally every resource would need to be updated as well to use the new provider meta ( I only did the two for testing, but this is as easy as a massive find replace).

Relations

Relates #25308

References

Output from Acceptance Testing

$ make testacc TESTARGS='-run=TestAccLightsailBucket_' PKG=lightsail     
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/lightsail/... -v -count 1 -parallel 20  -run=TestAccLightsailBucket_ -timeout 180m
=== RUN   TestAccLightsailBucket_basic
=== PAUSE TestAccLightsailBucket_basic
=== RUN   TestAccLightsailBucket_region
=== PAUSE TestAccLightsailBucket_region
=== RUN   TestAccLightsailBucket_BundleId
=== PAUSE TestAccLightsailBucket_BundleId
=== RUN   TestAccLightsailBucket_disappears
=== PAUSE TestAccLightsailBucket_disappears
=== RUN   TestAccLightsailBucket_tags
=== PAUSE TestAccLightsailBucket_tags
=== CONT  TestAccLightsailBucket_basic
=== CONT  TestAccLightsailBucket_disappears
=== CONT  TestAccLightsailBucket_tags
=== CONT  TestAccLightsailBucket_BundleId
=== CONT  TestAccLightsailBucket_region
--- PASS: TestAccLightsailBucket_disappears (128.44s)
--- PASS: TestAccLightsailBucket_basic (162.29s)
--- PASS: TestAccLightsailBucket_BundleId (264.69s)
--- PASS: TestAccLightsailBucket_region (272.90s)
--- PASS: TestAccLightsailBucket_tags (362.18s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/lightsail  365.437s


$ make testacc TESTARGS='-run=TestAccLightsailDisk_' PKG=lightsail 
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/service/lightsail/... -v -count 1 -parallel 20  -run=TestAccLightsailDisk_ -timeout 180m
=== RUN   TestAccLightsailDisk_basic
=== PAUSE TestAccLightsailDisk_basic
=== RUN   TestAccLightsailDisk_Tags
=== PAUSE TestAccLightsailDisk_Tags
=== RUN   TestAccLightsailDisk_disappears
=== PAUSE TestAccLightsailDisk_disappears
=== CONT  TestAccLightsailDisk_basic
=== CONT  TestAccLightsailDisk_disappears
=== CONT  TestAccLightsailDisk_Tags
--- PASS: TestAccLightsailDisk_disappears (160.14s)
--- PASS: TestAccLightsailDisk_basic (179.56s)
--- PASS: TestAccLightsailDisk_Tags (380.70s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/lightsail  383.867s

...

@github-actions
Copy link

Community Note

Voting for Prioritization

  • Please vote on this pull request by adding a 👍 reaction to the original post to help the community and maintainers prioritize this pull request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

For Submitters

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • For new resources and data sources, use skaff to generate scaffolding with comments detailing common expectations.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added size/XL Managed by automation to categorize the size of a PR. client-connections Pertains to the AWS Client and service connections. provider Pertains to the provider itself, rather than any interaction with AWS. service/ec2 Issues and PRs that pertain to the ec2 service. service/lightsail Issues and PRs that pertain to the lightsail service. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. verify Pertains to the verify package (i.e., provider-level validating, diff suppression, etc.) needs-triage Waiting for first response or review from a maintainer. labels May 21, 2023
@brittandeyoung brittandeyoung added the proposal Proposes new design or functionality. label May 21, 2023
@AdamTylerLynch
Copy link
Collaborator

In the case of pilot light multi-region deployments, I think the approach presented here would generally work.

There are some interesting concerns around resource types that inherently support multi-region via primary/replica attributes or global attributes.

I also would want to address how Terraform would destroy resources in a region if the practitioner removes a region. Say they want to leave ap-south-1, how could they ensure that all resources were removed?

See #27758 for some good comments.

@brittandeyoung
Copy link
Collaborator Author

brittandeyoung commented May 22, 2023

@AdamTylerLynch In the case of resources that support multiple regions, I think that this will simplify those implementations. Just looking at the example of aws_dynamodb_table_replica. Currently the code is creating a new session to the target region, where with this update it could instead select the correct client for the target region and remove the need for configuring these clients within each multi-region resource.

If a user wanted to remove all resources from a region, they would need to remove the terraform configuration for those resources and allow terraform to destroy them, or run a targeted destroy.

If a user wanted to move from say ap-south-1 to us-west-1, they would simply update the region value for each resource (hopefully they would have the target region as a local so it only requires updating a single value), then terraform would handle destroying and recreating them all in the new region. Since the value is determined at the region, dynamic values can be used like local, for_each and count.

In the below basic example, a user would simply remove the region from the local.regions list, and terraform would handle destroying the resources.

locals {
  regions = [
  "us-east-1",
  "us-east-2"]
}


resource "aws_lb" "test" {
  for_each           = teset(local.regions)
  region              = each.key
  name               = "test-lb-tf"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.allow_tls.id]
  subnets            = data.aws_subnets.public.ids

  tags = {
    Environment = "production"
  }
}

In the event that the user updated the region value for the provider, terraform would behave as it does today and simply error out stating that it could not find the resources. A destroy would need to be performed before cleaning updating the default region for the provider.

One thing I am still trying to figure out how to do is in the event a user was providing a region value and decides to remove that value, The provider needs to be smart enough to know to default to the provider default region. Still working on how to technically make that work.

@justinretzolk justinretzolk removed the needs-triage Waiting for first response or review from a maintainer. label May 22, 2023
@github-actions github-actions bot added create Pertains to generating names, hashcodes, etc. flex Pertains to FLatteners and EXpanders. labels May 24, 2023
@brittandeyoung
Copy link
Collaborator Author

@AdamTylerLynch I have additionally added a central flex function in order to return a proper error to the client in the event they are attempting to deploy to a region that is not in the allowed_regions provider settings. This should help force the behavior of either updating the region or destroying the resource before the region is removed from the allowed list.

@gdavison
Copy link
Contributor

Thanks for all the thought and work you've put into this, @brittandeyoung. We're going to have some discussions on our team around the approach to take with this.

We see some definite benefits to multiple regions with one providers, but also some areas for caution, especially around how regionally-separated AWS services are.

@brittandeyoung
Copy link
Collaborator Author

Thanks for all the thought and work you've put into this, @brittandeyoung. We're going to have some discussions on our team around the approach to take with this.

We see some definite benefits to multiple regions with one providers, but also some areas for caution, especially around how regionally-separated AWS services are.

@gdavison great! I will wait to hear from the provider team before I put any more work into this.

@YakDriver
Copy link
Member

YakDriver commented Jun 1, 2023

I personally like this idea.

  1. It could help with performance since you wouldn't need multiple providers (especially for people who support a lot of regions)
  2. It would simplify configurations 🎉 (e.g., replications, clusters, modules)
  3. There may be other things that could work this way (I dunno but maybe assume_role, profile?)

@breathingdust breathingdust added the external-maintainer Contribution from a trusted external contributor. label Jul 25, 2023
@apparentlymart
Copy link
Member

apparentlymart commented Sep 1, 2023

Hello! I work on Terraform Core rather than on the AWS provider so I'm just an interested onlooker here and not intending to speak on behalf of the AWS provider team. I've had a quiet hope for the AWS provider to adopt a model like this for a long time, so I'm very excited to see this proposal/prototype. 😀

This new approach could potentially also help with another long-standing oddity: if one changes the region argument in the provider configuration today after some remote objects already exists then the provider typically tries to refresh the existing objects in the wrong region, gets a "not found" error, and assumes that the objects have been deleted. Unless the user checks the provider's work and rejects the plan, they could potentially end up creating duplicate objects in the new region and leaving behind forgotten objects in the old region.

Here's one way I could imagine improving that region change situation:

  • The region argument in each resource type is declared as Optional: true, Computed: true, so the provider can populate it if the module author doesn't.
  • If per-resource region is unset in the configuration, the provider automatically populates it during planning (i.e. CustomizeDiff in the SDK) with the same region as configured in the corresponding provider "aws" block, which now serves as the "default" region for any resource that doesn't specify one.
    • To allow upgrading from state objects that were created with earlier versions of the provider, each resource type would need an upgrade rule to retrofit a value for region into any resource instance that doesn't already have one, based on the current provider configuration. For correct results, the author would need to leave the provider-level region unchanged while resolving the upgrade so that the correct per-resource value can be retrofitted and then the configuration and state will match during the first plan after upgrading.
    • I think this rule would effectively replace the empty-region-handling DiffSuppressFunc in the current PR, since there would no longer be any case where a resource would have no region populated except during the state upgrade step, which always runs before any planning steps.
  • When refreshing existing objects and when destroying, the provider uses the region attribute on the resource to decide which region to make the query to, ignoring the "default region" in the provider "aws" block.
  • When diffing, if a resource instance region in configuration is different than the prior state, signal that replacement is required ("Force new" in SDK terminology) so that Terraform will propose to delete the object in the old region and create a new object in the new region.
  • If the region in the provider "aws" block changes when remote objects already exist, the provider treats that as a change to desired state of all resources which don't explicitly set region, making it now possible to mass-migrate many objects from one region to another by replacement.

The above is just one design idea and I listed out these details only in case they are useful; some parts of this are probably challenging to implement with the current capabilities of the plugin SDK, but I think it is all valid within and consistent with the spirit of Terraform's resource instance change lifecycle. The main thing I'd love to see, details aside, is that changing region in the provider "aws" block would no longer cause the provider to be confused by "not found" errors and instead have it treat the region change like a normal change to desired state.

Again, I'm not speaking on behalf of the AWS provider team and so please don't take any actions based on this until the provider team has said something about it! I don't want anyone to spend time on implementing something like this if the provider team would eventually oppose it on principle anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-connections Pertains to the AWS Client and service connections. create Pertains to generating names, hashcodes, etc. external-maintainer Contribution from a trusted external contributor. flex Pertains to FLatteners and EXpanders. proposal Proposes new design or functionality. provider Pertains to the provider itself, rather than any interaction with AWS. service/ec2 Issues and PRs that pertain to the ec2 service. service/lightsail Issues and PRs that pertain to the lightsail service. size/XL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. verify Pertains to the verify package (i.e., provider-level validating, diff suppression, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants