Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workspace checksum to avoid unnecessary terraform init execution #90

Merged
merged 1 commit into from
Mar 20, 2023

Conversation

bobh66
Copy link
Collaborator

@bobh66 bobh66 commented Feb 23, 2023

Description of your changes

Maintain a checksum of the workspace directory content and skip running terraform init if the checksum value has not changed since the last reconciliation.

The workspace content is regenerated on every reconciliation so that any changes to the inline or remote workspace, anything in the workspace.spec, the ProviderConfig credentials or configuration will be reflected in the new workspace content and the checksum will be changed accordingly. If the checksum changes then terraform init is re-executed to pick up the new changes. But if nothing has changed then terraform init is redundant and just wastes (a lot of) CPU, and so it can be skipped.

The checksum is stored in the Workspace Status, which does not get updated until after the SECOND reconciliation cycle after creation (status updates on Create are overwritten) so terraform init will be executed for the first TWO reconciliations but after that it will only be executed on changes to the content of the workspace/configuration/credentials/etc.

Fixes #81

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Tested in a development environment running a variety of Workspaces, and verified through logging and timestamp analysis that terraform init is only called when expected.

@bobh66
Copy link
Collaborator Author

bobh66 commented Feb 24, 2023

Also note that reducing the number of terraform init calls is especially important now that we are using the shared provider cache, since terraform only allows one terraform init instance to access the cache at a time and we enforce this behavior with a semaphore to prevent crashes in the CLI. Reconciliation throughput can be increased significantly if we don't require all reconciliations to go through this bottleneck on every cycle.

Copy link
Member

@ytsarev ytsarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting optimization, very much required, thanks @bobh66 !

I've left some comments for the first round of review.

@@ -236,6 +236,17 @@ func (h Harness) DeleteCurrentWorkspace(ctx context.Context) error {
return Classify(err)
}

// GenerateChecksum calculates the md5sum of the workspace to see if terraform init needs to run
func (h Harness) GenerateChecksum(ctx context.Context) (string, error) {
command := "/usr/bin/find . -type f -exec /usr/bin/md5sum {} + | LC_ALL=C /usr/bin/sort | /usr/bin/md5sum | /usr/bin/awk '{print $1}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested in implementation details here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly looking for something simple that works. I'm happy to change it to a more robust implementation now that I know the basic concept is sound.

Copy link
Collaborator Author

@bobh66 bobh66 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirhash appears to struggle with directories that include symlinks:

"error": "cannot calculate workspace checksum: read /tf/ab7e0691-b45d-4d0b-a14b-6bb7c1e06142/.terraform/providers/registry.terraform.io/hashicorp/aws/4.56.0/linux_amd64: is a directory"

The implementation checks for directories but not symlinks, which it leaves in the list of "files" and that causes the error.

I know in Python I could easily override the implementation of DirFiles() to fix the error without needing help from upstream. Is there anything similar in go? Or should I just re-implement HashDir and DirFiles locally and use the dirhash hashing function to do the calculations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobh66 that does not sound trivial :) what about we merge the current find based implementation and create an Issue for optimization in the future?

@@ -142,6 +142,7 @@ type WorkspaceSpec struct {
type WorkspaceStatus struct {
xpv1.ResourceStatus `json:",inline"`
AtProvider WorkspaceObservation `json:"atProvider,omitempty"`
Checksum string `json:"checksum,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we put this one under AtProvider as well? Semantically looks like WorkspaceObservation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

internal/controller/workspace/workspace.go Show resolved Hide resolved
@bobh66 bobh66 changed the title Add workspace checksum to avoid unnecessary terraform init execution Draft: Add workspace checksum to avoid unnecessary terraform init execution Mar 9, 2023
Signed-off-by: Bob Haddleton <bob.haddleton@nokia.com>
@bobh66 bobh66 changed the title Draft: Add workspace checksum to avoid unnecessary terraform init execution Add workspace checksum to avoid unnecessary terraform init execution Mar 9, 2023
Copy link
Member

@ytsarev ytsarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobh66 thanks for addressing suggestions, lgtm, is this one ready to merge?

@bobh66
Copy link
Collaborator Author

bobh66 commented Mar 20, 2023

Thanks @ytsarev - I think we can start with this implementation and refine as needed. Thanks!

@bobh66 bobh66 merged commit 509724e into upbound:main Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provider runs terraform init on every reconciliation
2 participants