-
-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown Doc worker when it is not considered as available in Redis #831 #856
Shutdown Doc worker when it is not considered as available in Redis #831 #856
Conversation
c0429b3
to
6778e82
Compare
app/gen-server/lib/DocWorkerMap.ts
Outdated
if (nbFailures >= 3) { | ||
log.error( | ||
'DocWorkerMap: Presence checker failed 3 times, considering the worker "%s" is not available anymore', | ||
workerId | ||
); | ||
return cb(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this is useful or whether that's what we would want:
- The only case I think we could ultimately enter in this condition is because of a failure when calling
sismemberAsync()
; - I tested simulating network issues (in my case, I simulated some
ECONNREFUSED
), no exception has been raised, and I never entered in this condition; - I feel like it is not the role of this method to handle connection issues with Redis, I see that errors are already logged above in this class;
I tend to think that we may just simplify everything:
- not count the number of failure in this recursive setTimeout;
- just log for each exception;
- If we want to shutdown the worker because it cannot join the redis server, it's another subject so another issue/PR (that I can take care of);
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I chose to do as I planed above.
buildtools/webpack.config.js
Outdated
test: /\.m?js/, | ||
resolve: { | ||
fullySpecified: false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit this is a piece of code I found on the internet to fix issues while building after I upgraded Sinon, and that I don't understand fully how it works neither the potential consequences.
I can try to know more about that next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to have a log of what those issues were?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here are the logs when I run yarn build:prod
and without the above patch. Removing the node_modules/
folder doesn't solve anything.
yarn run v1.22.21
$ buildtools/build.sh
+ tsc --build
+ buildtools/update_type_info.sh app
+ webpack --config buildtools/webpack.config.js --mode production
assets by status 8.11 MiB [cached] 21 assets
orphan modules 1.13 MiB [orphan] 236 modules
runtime modules 20.6 KiB 46 modules
modules by path ./node_modules/ 8.18 MiB
javascript modules 7.86 MiB 652 modules
json modules 327 KiB 3 modules
modules by path ./app/ 3.87 MiB 279 modules
modules by path ./test/ 219 KiB 28 modules
modules by path ./stubs/app/ 671 bytes
modules by path ./stubs/app/client/ui/*.ts 386 bytes 3 modules
./stubs/app/common/version.ts 156 bytes [built] [code generated]
./stubs/app/client/components/Banners.ts 129 bytes [built] [code generated]
external "jQuery" 42 bytes [built] [code generated]
external "alert" 42 bytes [built] [code generated]
./package.json 6.45 KiB [built] [code generated]
WARNING in ./stubs/app/common/version.ts 2:23-42
Should not import the named export 'version' (imported as 'packageJson') from default-exporting module (only default export is available soon)
@ ./app/client/ui/AppHeader.ts 5:0-46 40:170-185 40:191-208 40:230-247
@ ./app/client/ui/errorPages.ts 3:0-52 90:25-34
@ ./app/client/errorMain.ts 1:0-56 2:0-12
ERROR in ./node_modules/sinon/pkg/sinon-esm.js 5166:15-22
Module not found: Error: Can't resolve 'process/browser' in '/home/florentpro/projects/grist-core/node_modules/sinon/pkg'
Did you mean 'browser.js'?
BREAKING CHANGE: The request 'process/browser' failed to resolve only because it was resolved as fully specified
(probably because the origin is strict EcmaScript Module, e. g. a module with javascript mimetype, a '*.mjs' file, or a '*.js' file where the package.json contains '"type": "module"').
The extension in the request is mandatory for it to be fully specified.
Add the extension to the request.
resolve 'process/browser' in '/home/florentpro/projects/grist-core/node_modules/sinon/pkg'
Parsed request is a module
using description file: /home/florentpro/projects/grist-core/node_modules/sinon/package.json (relative path: ./pkg)
Field 'browser' doesn't contain a valid alias configuration
resolve as module
looking for modules in /home/florentpro/projects/grist-core
/home/florentpro/projects/grist-core/process doesn't exist
looking for modules in /home/florentpro/projects/grist-core/ext
/home/florentpro/projects/grist-core/ext/process doesn't exist
looking for modules in /home/florentpro/projects/grist-core/stubs
/home/florentpro/projects/grist-core/stubs/process doesn't exist
looking for modules in /home/florentpro/projects/grist-core/node_modules
existing directory /home/florentpro/projects/grist-core/node_modules/process
using description file: /home/florentpro/projects/grist-core/node_modules/process/package.json (relative path: .)
using description file: /home/florentpro/projects/grist-core/node_modules/process/package.json (relative path: ./browser)
Field 'browser' doesn't contain a valid alias configuration
/home/florentpro/projects/grist-core/node_modules/process/browser doesn't exist
@ ./test/common/InactivityTimer.ts 4:0-31 8:10-19
@ ./test/client-harness/client.js
webpack 5.73.0 compiled with 1 error and 1 warning in 37312 ms
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have fixed properly the problem here: 5e874f7
Though let's see what the CI tells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled a bit again, but here is what I hope would work: https://github.com/gristlabs/grist-core/pull/856/files#diff-249979d31b349e184d0568ed390ed1ad3fd8ca63e59b9f0c38fcde4d9ec445c5
8eb4e75
to
a68d2f1
Compare
2e86558
to
db2e03a
Compare
The test failure is not due to the instabilities. I convert this PR into draft until it is fixed. |
Overall, the logic here feels similar to a container health check. It seems fairly reasonable to expect that a Grist installation with multiple workers is containerized, and probably has health checks running on each Grist instance? In our own deployments we use the Not against this change, just proposing an alternative to see what you think @fflorent. Skimmed the PR, and it looks generally in good shape except it could be worth clearing any pending timeout in the |
fe3ec42
to
f14a0ad
Compare
@paulfitz Thanks for your remark! Sounds wise and much simpler, indeed. After discussing with ops, that seems to answer the need. I applied your requested change, I just kept the upgrade of the Sinon dependency, as it was a painful move and it would remove a barrier to use features of newer versions in the future. If that's OK for you? |
f14a0ad
to
4d0c602
Compare
2502232
to
0720b4c
Compare
@fflorent any thoughts on the |
@paulfitz sould be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, looks like I never submitted this small comment :(
The next commits requires tickAsync which was introduced in newer versions
8cc5b3d
to
6955609
Compare
@paulfitz Right, fixed, rebased and pushed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fflorent !
Context
In #831, we asked whether we could make a doc worker available again when they can be reached again.
Instead of that, we suggest to just shutdown the worker and assume that another worker would be created instead.
Proposed solution
This PR proposes that the doc worker shutdowns itself in two cases:
The check is at most every 30 seconds (30 seconds + the duration of requesting Redis).