Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker compose restart pragma #1442

Merged
merged 2 commits into from
Oct 10, 2024
Merged

Conversation

frankhereford
Copy link
Member

@frankhereford frankhereford commented Oct 8, 2024

Associated issues

This PR is intended to close cityofaustin/atd-data-tech#18960.

Testing

This is the easiest thing to test in the world. Please check out this branch, cd into the moped-database directory, and execute a hasura-cluster start. Did it work? Did you get a running postgres container and a running graphql-engine container?

A note on testing this

The number of retries is chosen out of thin air -- it can be 30, 300, 3000, whatever -- just not 1. It needs to be large enough to keep graphql-engine retrying the PG container long enough for the PG container to initialize the fresh DB, which it does on start before it begins accepting connections. I'm asking that everyone who is a regular moped contributor to test this because really, I want to test that this magic number 30 works for all of us. Essentially whoever has the slowest disk or the slowest computer overall will be the person who determines if 30 is suitable.

Background

This is copy borrowed from a conversation between @mddilley and myself.

Super short story, is that here’s what happens when you fire up the DB compose stack without a restart pragma or my proposed solution:

  1. Both the DB and the graphql-engine containers fire up at the same time.
  2. The DB container takes maybe 5-10 seconds to initialize itself before it accept connections
  3. The graphql-engine container reaches out to the PG container during this initialization and finds that it is not yet accepting connections and terminates.

It’s because of this, we have this restart pragma where the compose system restarts it over and over until the PG container is initialized, essentially making a “retry connection” functionality.

So, I propose, instead of using the restart functionality, we rather ask the graphql-engine container to retry using the environment variable HASURA_GRAPHQL_NO_OF_RETRIES, like this.


Ship list

Copy link
Member

@chiaberry chiaberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stopped my cluster and restarted by replicating from prod. Spun up like normal 👍🏼

Copy link
Collaborator

@mddilley mddilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with and without replicating, and my stack came up just like normal. 🚢

@@ -21,6 +20,7 @@ services:
- 'HASURA_GRAPHQL_JWT_SECRET={"type":"RS256","jwk_url": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_U2dzkxfTv/.well-known/jwks.json","claims_format": "stringified_json"}'
- HASURA_GRAPHQL_ADMIN_SECRET=hasurapassword
- ACTIVITY_LOG_API_SECRET=hasurapassword
- HASURA_GRAPHQL_NO_OF_RETRIES=30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a TIL for me. I was a little confused about the docs saying it was "deprecated" but then I read this blurb.

In v2.0, the values of the following env vars are used to define the connection parameters of the default database while updating an existing instance or while starting a fresh instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding that Mike, and double thanks for checking so thoroughly into the documentation on the variable. I had not seen the depreciation notice that you pointed out -- I saw the variable name somewhere, thought I'd give it a whirl and it did exactly what I had expected it to do. I don't think they are using the word deprecation here quite right -- they should call it a one-time-use value or something like that. But what I really want to say is - thank you for your detailed review.

Copy link
Contributor

@tillyw tillyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested by spinning up my cluster with and without replicating and both worked smoothly! 🚢

@mateoclarke
Copy link
Contributor

./hasura-cluster start worked. having some issues connecting to the RR in my replicate attempts but I don't think that's related to this change.

@frankhereford frankhereford merged commit c6772f4 into main Oct 10, 2024
4 checks passed
@frankhereford frankhereford deleted the docker-compose-restart-pragma branch October 10, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adjust Moped's local stack to use the correct restart pragma for graphql-engine
5 participants