-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getSnippetHash
: Use regexp instead of parsing whole AST
#759
Conversation
When calculating the env hash to use as cache keys, we don't need an exact list of imports Therefore, to improve performance, we can replace the current strategy of using the AST by a regexp strategy. This new strategy can lead to false positives (e.g. a string containing import 'foo') but this is not an issue for the hashing function. All that matters is that the hash is consistent and accounts for all files (no false negatives)
Benchstat (compared to main):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a simple question. Overall LGTM. The performance improvements are amazing! I'm pretty sure the speed up is because of the reduction in the number of allocations, though I could be wrong.
Yeah, the issue with the AST is that you have to parse all of it and traverse all nodes |
d64804b
to
e999249
Compare
Made a stupid mistake in the previous PR: #759 This fixes it and adds another benchmark test to ensure it doesn't happen again. I also removed the Github Actions benchmark test, as it's not really useful, anytime we change the tests, we'll get erroneous results which will be annoying. Instead, I added the benchmark tests to the Drone run, we can compare whenever we want.
* Fix `getSnippetHash` not considering all files Made a stupid mistake in the previous PR: #759 This fixes it and adds another benchmark test to ensure it doesn't happen again. I also removed the Github Actions benchmark test, as it's not really useful, anytime we change the tests, we'll get erroneous results which will be annoying. Instead, I added the benchmark tests to the Drone run, we can compare whenever we want. * linting * Add changelog, will release straight away
When calculating the env hash to use as cache keys, we don't need an exact list of imports
Therefore, to improve performance, we can replace the current strategy of using the AST by a regexp strategy.
This new strategy can lead to false positives (e.g. a string containing import 'foo') but this is not an issue for the hashing function.
All that matters is that the hash is consistent and accounts for all files (no false negatives)
On actual huge real-life environments, I have seen improvements of up to 95% (from 2s to 100ms)