Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have easier to way to tell the progress of compactor (and downsampling) #3985

Closed
bwplotka opened this issue Mar 29, 2021 · 8 comments · Fixed by #4801
Closed

Have easier to way to tell the progress of compactor (and downsampling) #3985

bwplotka opened this issue Mar 29, 2021 · 8 comments · Fixed by #4801

Comments

@bwplotka
Copy link
Member

Currently, you need to check Thanos Compact UI and check if all older blocks are bigger. There should be only up to 5 of 2h, 8h and 2d blocks. Rest should be compacted to 2w. Similar for downsampled blocks.

Or you can check the number of compactions per day to see if the number stabilizes.

Both ways are pretty manual. I would propose adding metric suggesting the backlog of compaction to make. This requires potentially changing our compaction planner logic, which is already planned for #3405

@2nick
Copy link
Contributor

2nick commented May 17, 2021

I'm trying to figure out the way of reporting progress and think that there are 2 possible ways:

  1. API with JSON response
  2. Prometheus metrics

In my opinion for both ways would be enough to give out:

  • block id
  • group id for the block
  • compaction state of the group
  • compaction state for the block

Probably it's reasonable to give a time of block's state change as a metric value.

OFC it's possible to implement own readers for all stages to report bytes progress, but it looks like "overkill".

WDYT?

@stale
Copy link

stale bot commented Jul 17, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 17, 2021
@vanugrah
Copy link
Contributor

Yes please! I spent the weekend trying to answer that question since we have a huge compaction backlog and I couldn't think of a simple instrumentation to add. Largely because currently compaction planning is iterative, so we'd need to simulate multiple plan invocations per group to accurately determine how many compaction runs would need to happen per group to reach the desired state.

As a user - I'd love to see an overall compaction percentage for the bucket as well as compaction progress for each group. I'm going to think more about this problem and report back.

@stale stale bot removed the stale label Jul 28, 2021
@kernelpanic77
Copy link

Hello @bwplotka! I am relatively new to the Thanos project but would love to get involved and potentially contribute. Any pointers or resources for me to get started with Thanos. So that I could get a better understanding of the project and this issue?

Thanks,
Ishan

@bwplotka bwplotka changed the title Have easier to way to tell the progress of compactor Have easier to way to tell the progress of compactor (and downsampling) Oct 12, 2021
@yeya24
Copy link
Contributor

yeya24 commented Oct 12, 2021

Yes please! I spent the weekend trying to answer that question since we have a huge compaction backlog and I couldn't think of a simple instrumentation to add. Largely because currently compaction planning is iterative, so we'd need to simulate multiple plan invocations per group to accurately determine how many compaction runs would need to happen per group to reach the desired state.

As a user - I'd love to see an overall compaction percentage for the bucket as well as compaction progress for each group. I'm going to think more about this problem and report back.

Sounds like a promising way to go!
We can have a single goroutine to run the planning simulation process. In that goroutine, we use grouper and planner to do planning based on metadata from fetchers.
We do planning based until there is no plan is available and count the number of iterations we need to do. This number represents the compactor progress.

Question:

  1. Each compaction iteration means different work to do. For example, a plan for 2 level 2 blocks and a plan for multiple level 4 blocks. How can we quantify the work?

@bwplotka
Copy link
Member Author

What we discussed in our 1:2 with @yeya24 @metonymic-smokey :

The idea to simulate planning sounds amazing. It will take more time (marginal) and complex code, but in the end we can (1) estimate compaction (2) plan it better (optimize!)

Each compaction iteration means different work to do. For example, a plan for 2 level 2 blocks and a plan for multiple level 4 blocks. How can we quantify the work?

We can estimate samples/bytes, but it will be an approximation (:

@metonymic-smokey
Copy link
Contributor

Another idea that @yeya24 came up with is calculating retention progress, once we have finished working on compaction and downsampling progress. Broadly, it will also be on the same lines i.e. simulation and exporting metrics.

@yeya24
Copy link
Contributor

yeya24 commented Nov 6, 2021

Another idea that @yeya24 came up with is calculating retention progress, once we have finished working on compaction and downsampling progress. Broadly, it will also be on the same lines i.e. simulation and exporting metrics.

Let's have another issue to track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants