Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow start many replication pollers in one shard #3790

Merged
merged 7 commits into from
Jan 13, 2023

Conversation

yux0
Copy link
Contributor

@yux0 yux0 commented Jan 9, 2023

What changed?
Allow start many replication pollers in one shard

Why?
Allow start many replication pollers in one shard to poll replication tasks from the source cluster.

How did you test it?
Unit tests.

Potential risks

Is hotfix candidate?

@yux0 yux0 requested a review from a team as a code owner January 9, 2023 23:54
processor.Stop()
delete(r.taskProcessors, perShardTaskProcessorKey)
}
if clusterInfo := newClusterMetadata[clusterName]; clusterInfo != nil && clusterInfo.Enabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if a cluster exists in both old & new cluster metadata? stop & restart?
what if a cluster only exists in both new cluster metadata? noop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a cluster exists in both old and new, it means the metadata is updated. So stop the processor and restart it to load the new metadata. If a cluster only exists in new cluster. It means this cluster is newly added, so we just start a new processor.

@@ -401,6 +403,7 @@ func (p *taskProcessorImpl) convertTaskToDLQTask(
// NOTE: last event vs next event, next event ID is exclusive
nextEventID := lastEvent.GetEventId() + 1

// TODO: GetShardID will break GetDLQReplicationMessages we need to handle DLQ for cross shard replication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will it break? Using local shardID for storing DLQ seems the right behavior to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Here is ok. But when hydrate the replication task from the source cluster, it needs to calculate the shard id instead of using this. We will not hit this case in a short term.

WithMaximumInterval(config.ReplicationTaskProcessorErrorRetryMaxInterval(shardID)).
WithMaximumAttempts(config.ReplicationTaskProcessorErrorRetryMaxAttempts(shardID)).
WithExpirationInterval(config.ReplicationTaskProcessorErrorRetryExpiration(shardID))
taskRetryPolicy := backoff.NewExponentialRetryPolicy(config.ReplicationTaskProcessorErrorRetryWait(pollingShardID)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config cannot be based on pollingShardID, right? or why would it be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have 3 clusters and the shard counts are all different, then what will you specify in the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You are right. I am going to revert this.

@yux0 yux0 merged commit 51fd1dd into temporalio:master Jan 13, 2023
@yux0 yux0 deleted the start-replication-poller branch January 13, 2023 18:35
yux0 added a commit that referenced this pull request Jan 13, 2023
* Allow start many replication poller in one shard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants