-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workspace checksum to avoid unnecessary terraform init execution #90
Conversation
Also note that reducing the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting optimization, very much required, thanks @bobh66 !
I've left some comments for the first round of review.
@@ -236,6 +236,17 @@ func (h Harness) DeleteCurrentWorkspace(ctx context.Context) error { | |||
return Classify(err) | |||
} | |||
|
|||
// GenerateChecksum calculates the md5sum of the workspace to see if terraform init needs to run | |||
func (h Harness) GenerateChecksum(ctx context.Context) (string, error) { | |||
command := "/usr/bin/find . -type f -exec /usr/bin/md5sum {} + | LC_ALL=C /usr/bin/sort | /usr/bin/md5sum | /usr/bin/awk '{print $1}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interested in implementation details here
- Why md5?
- Why was shell-based implementation chosen?
- Can we use something golang native like https://pkg.go.dev/golang.org/x/mod/sumdb/dirhash instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly looking for something simple that works. I'm happy to change it to a more robust implementation now that I know the basic concept is sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dirhash appears to struggle with directories that include symlinks:
"error": "cannot calculate workspace checksum: read /tf/ab7e0691-b45d-4d0b-a14b-6bb7c1e06142/.terraform/providers/registry.terraform.io/hashicorp/aws/4.56.0/linux_amd64: is a directory"
The implementation checks for directories but not symlinks, which it leaves in the list of "files" and that causes the error.
I know in Python I could easily override the implementation of DirFiles() to fix the error without needing help from upstream. Is there anything similar in go? Or should I just re-implement HashDir and DirFiles locally and use the dirhash hashing function to do the calculations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobh66 that does not sound trivial :) what about we merge the current find
based implementation and create an Issue for optimization in the future?
apis/v1beta1/workspace_types.go
Outdated
@@ -142,6 +142,7 @@ type WorkspaceSpec struct { | |||
type WorkspaceStatus struct { | |||
xpv1.ResourceStatus `json:",inline"` | |||
AtProvider WorkspaceObservation `json:"atProvider,omitempty"` | |||
Checksum string `json:"checksum,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we put this one under AtProvider
as well? Semantically looks like WorkspaceObservation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
Signed-off-by: Bob Haddleton <bob.haddleton@nokia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobh66 thanks for addressing suggestions, lgtm, is this one ready to merge?
Thanks @ytsarev - I think we can start with this implementation and refine as needed. Thanks! |
Description of your changes
Maintain a checksum of the workspace directory content and skip running
terraform init
if the checksum value has not changed since the last reconciliation.The workspace content is regenerated on every reconciliation so that any changes to the inline or remote workspace, anything in the workspace.spec, the ProviderConfig credentials or configuration will be reflected in the new workspace content and the checksum will be changed accordingly. If the checksum changes then
terraform init
is re-executed to pick up the new changes. But if nothing has changed thenterraform init
is redundant and just wastes (a lot of) CPU, and so it can be skipped.The checksum is stored in the Workspace Status, which does not get updated until after the SECOND reconciliation cycle after creation (status updates on Create are overwritten) so
terraform init
will be executed for the first TWO reconciliations but after that it will only be executed on changes to the content of the workspace/configuration/credentials/etc.Fixes #81
I have:
make reviewable
to ensure this PR is ready for review.How has this code been tested
Tested in a development environment running a variety of Workspaces, and verified through logging and timestamp analysis that
terraform init
is only called when expected.