Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

mydump,restore: Make 1 Chunk = 1 File and Use Coarse Row ID Assignment #109

Merged
merged 1 commit into from
Dec 26, 2018

Conversation

kennytm
Copy link
Collaborator

@kennytm kennytm commented Dec 22, 2018

What problem does this PR solve?

Implement RFC 2, reduces the 300G import time (which has a slow disk) from ~3¼ hours to ~2⅗ hours, improving the speed by about 18%.

What is changed and how it works?

As explained in the RFC,

  1. Now a chunk is always as large as the whole file
  2. Row IDs are no longer assigned consecutively, the IDs between two chunks will be separated by the theoretical maximum.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    • Try to import 10G and 300G of data and verify the time taken is shorter than 6 minutes / 3 hours respectively.

Code changes

  • (We need to rewrite this section)

Side effects

  • Possible performance regression

Related changes

  • Need to update the documentation
  • Need to update the tidb-ansible repository
  • Need to be included in the release note

@kennytm kennytm added status/WIP Work in progress Should Update Docs Should update docs after this PR is merged. Remove this label once the docs are updated priority/important type/enhancement Performance improvement or refactoring labels Dec 22, 2018
@sre-bot
Copy link

sre-bot commented Dec 22, 2018

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

@kennytm kennytm changed the title [WIP] mydump,restore: implement RFC 2, eliminate one data file read mydump,restore: implement RFC 2, eliminate one data file read Dec 22, 2018
@kennytm kennytm added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP Work in progress labels Dec 22, 2018
@kennytm kennytm changed the title mydump,restore: implement RFC 2, eliminate one data file read mydump,restore: Make 1 Chunk = 1 File and Use Coarse Row ID Assignment Dec 22, 2018
@kennytm
Copy link
Collaborator Author

kennytm commented Dec 22, 2018

/run-all-tests

@kennytm
Copy link
Collaborator Author

kennytm commented Dec 22, 2018

PTAL @csuzhangxc @GregoryIan @lonng

@lonng
Copy link
Contributor

lonng commented Dec 24, 2018

Awesome 👍 @kennytm

Copy link
Contributor

@lonng lonng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lonng lonng added status/LGT1 One reviewer already commented LGTM (LGTM1) and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Dec 24, 2018
@IANTHEREAL
Copy link
Collaborator

Now I think the speed can be optimized, I'm not sure whether it will interfere with the optimization

A few tests need to relaxed due to the coarser ID assignment
// // check - rows num
// var tolRows int64 = 0
// for _, region := range regions {
// tolRows += region.Rows()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to update the name of Rows? because it is not real rows count now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would remove useless code later, all code about chunk are useless

@IANTHEREAL
Copy link
Collaborator

LGTM

@IANTHEREAL IANTHEREAL added status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) and removed status/LGT1 One reviewer already commented LGTM (LGTM1) labels Dec 26, 2018
@kennytm
Copy link
Collaborator Author

kennytm commented Dec 26, 2018

/run-all-tests

@kennytm kennytm merged commit da3feda into master Dec 26, 2018
@kennytm kennytm deleted the kennytm/1-chunk-1-file branch January 8, 2019 10:14
@kennytm kennytm added Should Update Ansible The config in TiDB-Ansible should be updated and removed Should Update Docs Should update docs after this PR is merged. Remove this label once the docs are updated labels Jan 8, 2019
@kennytm kennytm removed the Should Update Ansible The config in TiDB-Ansible should be updated label May 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) type/enhancement Performance improvement or refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants