Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table checkpoint-ts can be larger than redo-resolved-ts #9769

Closed
hicqu opened this issue Sep 18, 2023 · 5 comments · Fixed by #9772
Closed

table checkpoint-ts can be larger than redo-resolved-ts #9769

hicqu opened this issue Sep 18, 2023 · 5 comments · Fixed by #9772
Assignees
Labels
affects-6.5 affects-7.1 area/ticdc Issues or PRs related to TiCDC. report/community The community has encountered this bug. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@hicqu
Copy link
Contributor

hicqu commented Sep 18, 2023

What did you do?

Here is a piece of TiCDC log:

egv-transactions-ch2-ticdc-2 ticdc [2023/09/15 20:04:27.713 +05:30] [INFO] [manager.go:850] ["Start table sink"] [namespace=default] [changefeed=egv-transactions-1-hyd] [tableID=74] [startTs=444278613643624570]
egv-transactions-ch2-ticdc-2 ticdc [2023/09/15 20:04:27.713 +05:30] [INFO] [table_sink_wrapper.go:145] ["Sink is started"] [namespace=default] [changefeed=egv-transactions-1-hyd] [tableID=74] [startTs=444278613643624570] [replicateTs=444278628074651689]
egv-transactions-ch2-ticdc-2 ticdc [2023/09/15 20:04:27.713 +05:30] [INFO] [processor.go:325] ["Add Table finished"] [captureID=18f5d46d-f065-4214-b118-b63dd3dc4fb9] [namespace=default] [changefeed=egv-transactions-1-hyd] [tableID=74] [tableResolvedTs=444278613643624570] [tableCheckpointTs=444278613643624570] [globalCheckpointTs=444278613643624570] [state=Replicating] [isPrepare=false]
egv-transactions-ch2-ticdc-2 ticdc [2023/09/15 20:04:27.813 +05:30] [ERROR] [manager.go:1019] ["sinkManager: resolved ts should not less than checkpoint ts"] [namespace=default] [changefeed=egv-transactions-1-hyd] [tableID=74] [resolvedTs=444278613643624570] [checkpointTs="{\"Mode\":0,\"Ts\":444278627681435676,\"BatchID\":18446744073709551615}"]

What did you expect to see?

No response

What did you see instead?

For a given table, checkpoint-ts is larger than redo-resolved-ts.
This check happens at cdc/processor/sinkmanager/manager.go:1019.
TiCDC version is v6.5.4.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

(paste TiCDC version here)
@hicqu hicqu added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. labels Sep 18, 2023
@nongfushanquan
Copy link
Contributor

/assign @CharlesCheung96

@hicqu
Copy link
Contributor Author

hicqu commented Sep 20, 2023

It's reproduced. Seems lots of tables (more than 200) and more than one captures are also required.

[2023/09/20 13:33:44.178 +08:00] [ERROR] [processor.go:646] ["run processor failed"] [capture=668dd527-ebd6-4f63-b2d9-313966635325] [namespace=default] [changefeed=x] [erro
r="[CDC:ErrInvalidCheckpointTs]checkpointTs(444383368798535681) should not larger than resolvedTs(444383368052736000)"] [errorVerbose="[CDC:ErrInvalidCheckpointTs]checkpoin
tTs(444383368798535681) should not larger than resolvedTs(444383368052736000)\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20220729040631-518f
63d66278/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20220729040631-518f63d66278/normalize.go:164\ngithub.co
m/pingcap/tiflow/cdc/scheduler/internal/v3/agent.(*agent).handleMessageHeartbeat\n\tgithub.com/pingcap/tiflow/cdc/scheduler/internal/v3/agent/agent.go:266\ngithub.com/pingc
ap/tiflow/cdc/scheduler/internal/v3/agent.(*agent).handleMessage\n\tgithub.com/pingcap/tiflow/cdc/scheduler/internal/v3/agent/agent.go:234\ngithub.com/pingcap/tiflow/cdc/sc
heduler/internal/v3/agent.(*agent).Tick\n\tgithub.com/pingcap/tiflow/cdc/scheduler/internal/v3/agent/agent.go:186\ngithub.com/pingcap/tiflow/cdc/processor.(*processor).tick
\n\tgithub.com/pingcap/tiflow/cdc/processor/processor.go:698\ngithub.com/pingcap/tiflow/cdc/processor.(*processor).Tick\n\tgithub.com/pingcap/tiflow/cdc/processor/processor
.go:592\ngithub.com/pingcap/tiflow/cdc/processor.(*managerImpl).Tick\n\tgithub.com/pingcap/tiflow/cdc/processor/manager.go:133\ngithub.com/pingcap/tiflow/pkg/orchestrator.(
*EtcdWorker).Run\n\tgithub.com/pingcap/tiflow/pkg/orchestrator/etcd_worker.go:290\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).runEtcdWorker\n\tgithub.com/pingcap/
tiflow/cdc/capture/capture.go:548\ngithub.com/pingcap/tiflow/cdc/capture.(*captureImpl).run.func4\n\tgithub.com/pingcap/tiflow/cdc/capture/capture.go:380\ngolang.org/x/sync
/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650"]
[2023/09/20 13:33:53.891 +08:00] [ERROR] [feed_state_manager.go:452] ["processor reports an error"] [namespace=default] [changefeed=x] [captureID=668dd527-ebd6-4f63-b2d9-31
3966635325] [error="{\"time\":\"2023-09-20T13:33:53.889723519+08:00\",\"addr\":\"10.2.7.2:8300\",\"code\":\"CDC:ErrInvalidCheckpointTs\",\"message\":\"[CDC:ErrInvalidCheckp
ointTs]checkpointTs(444383368798535681) should not larger than resolvedTs(444383368052736000)\"}"]

@hicqu
Copy link
Contributor Author

hicqu commented Sep 20, 2023

#9772 is tested manually, it can resolve the problem.

@hicqu
Copy link
Contributor Author

hicqu commented Sep 20, 2023

This is caused by function cdc/scheduler/internal/v3/replication.Manager.AdvanceCheckpoint, which advances BarrierTs faster than RedoResolvedTs. After BarrierTs has been advanced to a position, CheckpointTs can be advanced to that position, too. So finally, the check about CheckpointTs must be less than RedoResolvedTs may fails unexpectly.

If consistent.level is set to none, this problem won't happen because RedoResolvedTs won't be used to calculate BarrierTs.

This problem won't cause data lose. It's introduced in #9178, which means it only affects v6.5.4.

@asddongmen asddongmen reopened this Oct 5, 2023
CharlesCheung96 pushed a commit to CharlesCheung96/tiflow that referenced this issue Oct 7, 2023
CharlesCheung96 pushed a commit to CharlesCheung96/tiflow that referenced this issue Oct 25, 2023
CharlesCheung96 pushed a commit to ti-chi-bot/tiflow that referenced this issue Nov 1, 2023
@seiya-annie
Copy link

/found community

@ti-chi-bot ti-chi-bot bot added the report/community The community has encountered this bug. label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 affects-7.1 area/ticdc Issues or PRs related to TiCDC. report/community The community has encountered this bug. severity/moderate type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

5 participants