Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: add request resource for lru loading process(#32205) #32452

Merged

Conversation

MrPresent-Han
Copy link
Contributor

related: #32205

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MrPresent-Han
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Apr 19, 2024
Copy link
Contributor Author

@MrPresent-Han MrPresent-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for review

Loader[K comparable, V any] func(key K) (V, bool)
Finalizer[K comparable, V any] func(key K, value V) error
Loader[K comparable, V any] func(ctx context.Context, key K) (V, error)
Finalizer[K comparable, V any] func(ctx context.Context, key K, value V) error
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two good points for adding context:

  1. avoid pointless wait when context has been canceled
  2. make log output able to be tracked by traceID

log.Warn("Failed to get disk cache for segment, wait and try again")
} else if err == merr.ErrSegmentLoadFailed {
log.Warn("Failed to load segment, wait and try again")
} else if err != nil {
return true, err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When loading failed, make cache to try again rather than simply failed. Maybe we need to enrich error returned here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be noted that the retry here is endless until succeeded or timeout. If the failure is determined (e.g., file broken, file not found, etc.), there would be likely a dead loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -1504,20 +1531,20 @@ func getResourceUsageEstimateOfSegment(schema *schemapb.CollectionSchema, loadIn
loadInfo.GetSegmentID(),
fieldIndexInfo.GetBuildID())
}
segmentMemorySize += neededMemSize
if mmapEnabled {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the most notable change here, I make it include segment memory usage when calculating for mmap segment, this is the point needed discussion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The memory cost on bloom filters should count.

My question is, provided that the memory cost on bloom filters are usually very low, why bother to change the code here? Is there any particular case that the BF memory cost made high?

segment *LocalSegment,
loadInfo *querypb.SegmentLoadInfo,
) (err error) {
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return a ErrOperationNotSupported error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -1504,20 +1531,20 @@ func getResourceUsageEstimateOfSegment(schema *schemapb.CollectionSchema, loadIn
loadInfo.GetSegmentID(),
fieldIndexInfo.GetBuildID())
}
segmentMemorySize += neededMemSize
if mmapEnabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The memory cost on bloom filters should count.

My question is, provided that the memory cost on bloom filters are usually very low, why bother to change the code here? Is there any particular case that the BF memory cost made high?

@MrPresent-Han MrPresent-Han force-pushed the fix-oom-request-resource-lru-dev-review branch from 8986018 to ad4c937 Compare April 19, 2024 06:07
@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. and removed size/L Denotes a PR that changes 100-499 lines. labels Apr 19, 2024
@tedxu
Copy link
Collaborator

tedxu commented Apr 19, 2024

/lgtm

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
@MrPresent-Han MrPresent-Han force-pushed the fix-oom-request-resource-lru-dev-review branch from ad4c937 to ad5ad7a Compare April 19, 2024 09:11
@sre-ci-robot sre-ci-robot removed the lgtm label Apr 19, 2024
@tedxu
Copy link
Collaborator

tedxu commented Apr 19, 2024

/lgtm

@tedxu tedxu merged commit 27cb486 into milvus-io:lru-dev Apr 19, 2024
6 of 9 checks passed
sunby pushed a commit to sunby/milvus that referenced this pull request Apr 22, 2024
Signed-off-by: chyezh <chyezh@outlook.com>

Add metric for lru and fix lost delete data when enable lazy load  (milvus-io#31868)

Signed-off-by: chyezh <chyezh@outlook.com>

feat: Support stream reduce v1 (milvus-io#31873)

related: milvus-io#31410

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Change do wait lru dev (milvus-io#31878)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

enhance: add config for disk cache (milvus-io#31881)

fix config not initialized (milvus-io#31890)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix error handle in search (milvus-io#31895)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix: thread safe vector (milvus-io#31898)

fix: insert record cannot reinsert (milvus-io#31900)

enhance: cancel concurrency restrict for stream reduce and add metrics (milvus-io#31892)

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: bit set (milvus-io#31905)

fix bitset clear to reset (milvus-io#31908)

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Fix 0404 lru dev (milvus-io#31914)

fix:
1. sealed_segment num_rows reset to std::null opt
2. sealed_segment lazy_load reset to true after evicting to avoid
shortcut

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix possible block due to unpin fifo activating principle (milvus-io#31924)

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Add lru reloader lru dev (milvus-io#31952)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix query limit (milvus-io#32060)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

fix: lru cache lost delete and wrong mem size (milvus-io#32072)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

enhance: add more metrics for cache and search (milvus-io#31777) (milvus-io#32097)

issue: milvus-io#30931

Signed-off-by: chyezh <chyezh@outlook.com>

fix:panic due to empty search result when stream reducing(milvus-io#32009) (milvus-io#32083)

related: milvus-io#32009

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: sealed segment may not exist when throw (milvus-io#32098)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

Major compaction 1st edition (milvus-io#31804) (milvus-io#32116)

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Co-authored-by: chasingegg <chao.gao@zilliz.com>

fix: inconsistent between state lock and load state (milvus-io#32171)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

enhance: Throw error instead of crash when index cannot be built (milvus-io#31844)

issue: milvus-io#27589

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

(cherry picked from commit 1b76766)
Signed-off-by: jaime <yun.zhang@zilliz.com>

update knowhere to support clustering (milvus-io#32188)

Signed-off-by: chasingegg <chao.gao@zilliz.com>

fix: segment release is not sync with cache (milvus-io#32212)

issue: milvus-io#32206

Signed-off-by: chyezh <chyezh@outlook.com>

fix: incorrect pinCount resulting unexpected eviction(milvus-io#32136) (milvus-io#32238)

related: milvus-io#32136

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: possible panic when stream reducing (milvus-io#32247)

related: milvus-io#32009

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

enhance: [lru-dev] add the related data size for the read apis (milvus-io#32274)

cherry-pick: milvus-io#31816

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>

add debug log (milvus-io#32303)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

Refine code for analyze task scheduler (milvus-io#32122)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

fix: memory leak on stream reduce (milvus-io#32345)

related: milvus-io#32304

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

feat: adding cache stats support (milvus-io#32344)

See milvus-io#32067

Signed-off-by: Ted Xu <ted.xu@zilliz.com>

Fix bug for version (milvus-io#32363)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

fix: remove sub entity in load delta log, update entity num in segment itself (milvus-io#32350)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

fix: clear data when loading failure (milvus-io#32370)

issue: milvus-io#30361

Signed-off-by: chyezh <chyezh@outlook.com>

fix: stream reduce memory leak for failing to release stream reducer(milvus-io#32345) (milvus-io#32381)

related: milvus-io#32345

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

Keep InProgress state when getting task state is init (milvus-io#32394)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>

add log for search failed (milvus-io#32367)

related: milvus-io#32136

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

enable asan by default (milvus-io#32423)

Signed-off-by: sunby <sunbingyi1992@gmail.com>

Major compaction refactoring (milvus-io#32149)

Signed-off-by: wayblink <anyang.wang@zilliz.com>

Lru dev debug (milvus-io#32414)

Co-authored-by: wayblink <anyang.wang@zilliz.com>

fix: protect loadInfo with atomic, remove rlock at cache to avoid dead lock (milvus-io#32436)

issue: milvus-io#32435

Signed-off-by: chyezh <chyezh@outlook.com>

fix: use Get but not GetBy of SegmentManager (milvus-io#32438)

issue: milvus-io#32435

Signed-off-by: chyezh <chyezh@outlook.com>

fix: return growing segment when sealed (milvus-io#32460)

issue: milvus-io#32435

Signed-off-by: chyezh <chyezh@outlook.com>

enhance: add request resource for lru loading process(milvus-io#32205) (milvus-io#32452)

related: milvus-io#32205

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: unexpected deleted index files when lazy loading(milvus-io#32136) (milvus-io#32469)

related: milvus-io#32136

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>

fix: reference count leak cause release blocked (milvus-io#32465)

issue: milvus-io#32379

Signed-off-by: chyezh <chyezh@outlook.com>

Fix compaction fail (milvus-io#32473)

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants