Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Workers no longer halt startup while the database schema needs an update #11344

Closed
anoadragon453 opened this issue Nov 15, 2021 · 4 comments
Closed
Labels
X-Release-Blocker Must be resolved before making a release

Comments

@anoadragon453
Copy link
Member

anoadragon453 commented Nov 15, 2021

When redeploying matrix.org today on 2a93005, we started with a test worker. The commit included a new schema migration (this one), which should have preventing the test worker from starting up.

Instead, the test worker failed to notice the new schema file, which is a bit worrying:

test worker logs
2021-11-15 11:57:25,378 - root - 345 - WARNING - main - ***** STARTING SERVER *****
2021-11-15 11:57:25,378 - root - 346 - WARNING - main - Server /home/synapse/src/synapse/app/generic_worker.py version 1.47.0rc2 (b=matrix-org-hotfixes,2a93005a7)
2021-11-15 11:57:25,378 - root - 347 - INFO - main - Server hostname: matrix.org
2021-11-15 11:57:25,379 - root - 348 - INFO - main - Instance name: test_worker-1
2021-11-15 11:57:25,379 - synapse.server - 309 - INFO - main - Setting up.
2021-11-15 11:57:25,485 - synapse.storage.databases - 65 - INFO - main - [database config 'master']: Checking database server
2021-11-15 11:57:25,488 - synapse.storage.databases - 71 - INFO - main - [database config 'master']: Preparing for databases ['main', 'state']
2021-11-15 11:57:25,488 - synapse.storage.prepare_database - 116 - INFO - main - ['main', 'state']: Checking existing schema version
2021-11-15 11:57:25,502 - synapse.storage.prepare_database - 124 - INFO - main - ['main', 'state']: Existing schema is 65 (+4 deltas)
2021-11-15 11:57:25,503 - synapse.storage.databases - 84 - INFO - main - [database config 'master']: Starting 'main' database
[...]
2021-11-15 11:57:25,921 - synapse.storage.databases.main.event_push_actions - 540 - INFO - main - Searching for stream ordering 1 month ago
2021-11-15 11:57:25,940 - synapse.storage.databases.main.event_push_actions - 545 - INFO - main - Found stream ordering 1 month ago: it's 2393068710
2021-11-15 11:57:25,940 - synapse.storage.databases.main.event_push_actions - 547 - INFO - main - Searching for stream ordering 1 day ago
2021-11-15 11:57:25,955 - synapse.storage.databases.main.event_push_actions - 552 - INFO - main - Found stream ordering 1 day ago: it's 2468697156
2021-11-15 11:57:25,957 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for pushers(id)
2021-11-15 11:57:25,959 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for deleted_pushers(stream_id)
2021-11-15 11:57:25,960 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for push_rules_stream(stream_id)
2021-11-15 11:57:26,251 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for pushers(id)
2021-11-15 11:57:26,251 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for deleted_pushers(stream_id)
2021-11-15 11:57:26,252 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for local_group_updates(stream_id)
2021-11-15 11:57:26,282 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for device_lists_stream(stream_id)
2021-11-15 11:57:26,283 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for user_signature_stream(stream_id)
2021-11-15 11:57:26,284 - synapse.storage.util.id_generators - 76 - INFO - main - initialising stream generator for device_lists_outbound_pokes(stream_id)
2021-11-15 11:57:26,469 - synapse.storage.databases - 101 - INFO - main - [database config 'master']: Starting 'state' database
2021-11-15 11:57:26,473 - synapse.storage.databases - 115 - INFO - main - [database config 'master']: prepared
2021-11-15 11:57:26,474 - synapse.server - 312 - INFO - main - Finished setting up.

The tail end of the applied_schema_deltas table (sorted by "version" asc) is:

      63 | 63/04add_presence_stream_not_offline_index.sql
      64 | 64/01msc2716_chunk_to_batch_rename.sql.postgres
      65 | 65/01msc2716_insertion_event_edges.sql
      65 | 65/02_thread_relations.sql
      65 | 65/03remove_hidden_devices_from_device_inbox.sql
      65 | 65/04_local_group_updates.sql
(268 rows)

The current state of the background_updates table is:

matrix=> select * from background_updates;
               update_name               |                          progress_json                           | depends_on | ordering 
-----------------------------------------+------------------------------------------------------------------+------------+----------
 remove_hidden_devices_from_device_inbox | {}                                                               |            |     6503
 local_group_updates_index               | {}                                                               |            |     6504
 event_thread_relation                   | {"last_event_id":"$redacted"}                                    |            |     6502
(3 rows)

matrix=> 

The process went:

  • Check out matrix-org-hotfixes
  • pip install -e .[all]
  • restart the test worker

So why couldn't the test worker see the src/synapse/storage/schema/main/delta/65/05remove_deleted_devices_from_device_inbox.sql migration file?

Holding off on v1.47.0rc3 until this is resolved.

@anoadragon453 anoadragon453 added the X-Release-Blocker Must be resolved before making a release label Nov 15, 2021
@richvdh
Copy link
Member

richvdh commented Nov 15, 2021

The current state of the applied_schema_deltas table is:

that's not the applied_schema_deltas table.

@anoadragon453
Copy link
Member Author

The current state of the applied_schema_deltas table is:

that's not the applied_schema_deltas table.

Oops, have fixed.

@anoadragon453
Copy link
Member Author

@richvdh deduced that #11255 caused _upgrade_existing_database to no longer run on workers - which means that workers will not prevent startup if the schema has not been changed.

This could potentially wreck havoc if you restart your worker processes (expecting a certain schema) before the main process has performed the schema update. Thus, this is a release blocker.

@anoadragon453 anoadragon453 changed the title Test worker process didn't notice new schema file upon startup Workers no longer halt startup while the database schema needs an update Nov 15, 2021
@anoadragon453
Copy link
Member Author

fixed in #11346.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Release-Blocker Must be resolved before making a release
Projects
None yet
Development

No branches or pull requests

2 participants