Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set config pageSize finding need remove blobs in gc #18861

Closed
wants to merge 1 commit into from

Conversation

NehemiahMi
Copy link

@NehemiahMi NehemiahMi commented Jun 28, 2023

if database performance is too slow, blobs is too big in one project. this FindBlobsShouldUnassociatedWithProject function search database will timeout. so need set variable pageSize to adapt this case

Thank you for contributing to Harbor!

Comprehensive Summary of your change

Issue being fixed

Fixes #(issue)

Please indicate you've done the following:

  • Well Written Title and Summary of the PR
  • Label the PR as needed. "release-note/ignore-for-release, release-note/new-feature, release-note/update, release-note/enhancement, release-note/community, release-note/breaking-change, release-note/docs, release-note/infra, release-note/deprecation"
  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Made sure tests are passing and test coverage is added if needed.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

Signed-off-by: 伏鸾 <liuppengcheng.lpc@antgroup.com>
@wy65701436
Copy link
Contributor

@NehemiahMi Have you noticed any performance issues with the database when running the garbage collector? Also, would it be possible for you to share performance data for page sizes of both 100 and 1000? It will be helpful for us to evaluate this PR.

@NehemiahMi
Copy link
Author

NehemiahMi commented Jun 28, 2023

@wy65701436 only used postgresql computer is too old, stress test environment can‘t delete artifact(delete restapi gateway is not allow),so after some stress test, left much data。 one project contain 5000+ blobs, FindBlobsShouldUnassociatedWithProject sql can't work, only setting page size small ,this gc schedule work.

FindBlobsShouldUnassociatedWithProject used below sql, search use join in 1000 blobs data

SELECT b.digest_blob FROM artifact a, artifact_blob b WHERE a.digest = b.digest_af AND a.project_id = 2800335 AND b.digest_blob IN (....)

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Merging #18861 (8e3cc90) into main (d36ca80) will decrease coverage by 22.74%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##             main   #18861       +/-   ##
===========================================
- Coverage   67.38%   44.64%   -22.74%     
===========================================
  Files         981      236      -745     
  Lines      106901    13101    -93800     
  Branches     2678     2678               
===========================================
- Hits        72030     5849    -66181     
+ Misses      30998     6960    -24038     
+ Partials     3873      292     -3581     
Flag Coverage Δ
unittests 44.64% <ø> (-22.74%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 748 files with indirect coverage changes

@Vad1mo Vad1mo added the area/gc label Jul 14, 2023
@wy65701436
Copy link
Contributor

@wy65701436 only used postgresql computer is too old, stress test environment can‘t delete artifact(delete restapi gateway is not allow),so after some stress test, left much data。 one project contain 5000+ blobs, FindBlobsShouldUnassociatedWithProject sql can't work, only setting page size small ,this gc schedule work.

FindBlobsShouldUnassociatedWithProject used below sql, search use join in 1000 blobs data

SELECT b.digest_blob FROM artifact a, artifact_blob b WHERE a.digest = b.digest_af AND a.project_id = 2800335 AND b.digest_blob IN (....)

This is the first time I have heard about this issue, and I need concrete proof regarding the performance issue that occurs when querying blobs with a page size of 1000.

@wy65701436
Copy link
Contributor

hi @NehemiahMi I suggest upgrading your physical machine to a more powerful one, considering that you mentioned it is currently an old machine. After the upgrade, conduct a performance regression. If the issue still persists, please reopen it.

@wy65701436 wy65701436 closed this Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

4 participants