You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Supporting rolling update is one of the most important things, but it's also one of the things people can make a lot of mistakes. These mistakes can block updating user cluster to the latest version of Druid, and so we should be able to catch them before it happens. (#6051)
Proposed changes
I would propose to add an integration test which checks various system status during rolling updates. As the existing integration tests, it would run tests against a druid cluster running on docker containers. It would also be able to run in Travis, so that we can find incompatible changes as soon as possible.
The rolling update test program accepts the below arguments:
Argument
IsMandatory
Default
Hash of commit for old version
false
HEAD^
Hash of commit for new version
false
HEAD
Path to configuration files for old version
true
Path to configuration files for new version
false
old configuration files
The test druid cluster consists of 1 overlord, 1 broker, 1 coordinator, 2 middleManagers, 1 historical.
Since stream ingestion is where unexpected incompatible changes usually happen, I would propose to test the below scenario. This would be executed sequentially.
Cluster initialization
Build both the old and the new versions.
Start a cluster of the old version.
Start a Kafka supervisor.
Produce some events, check task status & query results.
Checkpoint supervisor & wait for segments to be loaded in historicals.
Historical test
Update historicals to the new version.
Wait for segments to be loaded.
Check query results.
Overlord test
Produce some events
Update overlord to the new version.
Check task status & query results.
MiddleManager test
Checkpoint supervisor
Update 1 of 2 MMs
Produce some events
Check task status & query results.
Update another MM
Produce some events
Check task status & query results.
Broker test
Checkpoint supervisor to publish segments.
Update broker to the new version.
Produce some events
Check query results
Coordinator test
Checkpoint supervisor to publish segments.
Update coordinator to the new version.
Wait for segments to be loaded.
Run a compaction task
Check the segment version and query results
#6208 should be implemented for manually checkpointing.
If one of tests fails, all task logs and system logs would be preserved and docker containers wouldn't stop as in the integration tests.
Rationale
I think this is the easiest way to automate testing the rolling update.
Operational impact
There's no operational impact.
The text was updated successfully, but these errors were encountered:
Motivation
Supporting rolling update is one of the most important things, but it's also one of the things people can make a lot of mistakes. These mistakes can block updating user cluster to the latest version of Druid, and so we should be able to catch them before it happens. (#6051)
Proposed changes
I would propose to add an integration test which checks various system status during rolling updates. As the existing integration tests, it would run tests against a druid cluster running on docker containers. It would also be able to run in Travis, so that we can find incompatible changes as soon as possible.
The rolling update test program accepts the below arguments:
The test druid cluster consists of 1 overlord, 1 broker, 1 coordinator, 2 middleManagers, 1 historical.
Since stream ingestion is where unexpected incompatible changes usually happen, I would propose to test the below scenario. This would be executed sequentially.
#6208 should be implemented for manually checkpointing.
If one of tests fails, all task logs and system logs would be preserved and docker containers wouldn't stop as in the integration tests.
Rationale
I think this is the easiest way to automate testing the rolling update.
Operational impact
There's no operational impact.
The text was updated successfully, but these errors were encountered: