Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: supporing hybrid search group_by #35982

Conversation

MrPresent-Han
Copy link
Contributor

related: #35096

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. area/internal-api labels Sep 4, 2024
@mergify mergify bot added the dco-passed DCO check passed. label Sep 4, 2024
Copy link
Contributor

mergify bot commented Sep 4, 2024

@MrPresent-Han

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

  1. Title Format: The PR title must begin with one of these prefixes:
  • feat: for introducing a new feature.
  • fix: for bug fixes.
  • enhance: for improvements to existing functionality.
  • test: for add tests to existing functionality.
  • doc: for modifying documentation.
  • auto: for the pull request from bot.
  1. Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly 

Please review and update your PR to comply with these guidelines.

Copy link
Contributor

mergify bot commented Sep 4, 2024

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-hybrid-search-groupby-review branch from cf390fc to fe2a9f2 Compare September 4, 2024 12:29
Copy link
Contributor

mergify bot commented Sep 4, 2024

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this pull request Sep 4, 2024
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the support-hybrid-search-groupby-review branch from fe2a9f2 to e05bb3e Compare September 4, 2024 14:07
Copy link
Contributor

mergify bot commented Sep 4, 2024

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this pull request Sep 5, 2024
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the support-hybrid-search-groupby-review branch from e05bb3e to 3249e4d Compare September 5, 2024 04:15
@mergify mergify bot added the ci-passed label Sep 5, 2024
Copy link

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 55 lines in your changes missing coverage. Please review.

Project coverage is 72.63%. Comparing base (5247631) to head (4b86fe6).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
internal/proxy/search_reduce_util.go 82.94% 25 Missing and 19 partials ⚠️
internal/querynodev2/services.go 83.33% 2 Missing and 1 partial ⚠️
internal/proxy/task_search.go 91.66% 1 Missing and 1 partial ⚠️
internal/querynodev2/segments/result.go 80.00% 1 Missing and 1 partial ⚠️
internal/querynodev2/segments/search_reduce.go 90.00% 0 Missing and 2 partials ⚠️
internal/util/reduce/reduce_info.go 95.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #35982      +/-   ##
==========================================
- Coverage   81.57%   72.63%   -8.94%     
==========================================
  Files        1266     1267       +1     
  Lines      150740   150904     +164     
==========================================
- Hits       122959   109609   -13350     
- Misses      22911    36409   +13498     
- Partials     4870     4886      +16     
Files with missing lines Coverage Δ
internal/core/src/query/SearchOnSealed.cpp 0.00% <ø> (-100.00%) ⬇️
internal/datanode/compaction/merge_sort.go 72.83% <ø> (ø)
internal/proxy/search_util.go 82.17% <100.00%> (+0.17%) ⬆️
internal/proxy/task.go 89.77% <ø> (ø)
internal/querynodev2/delegator/delegator.go 87.42% <100.00%> (+0.28%) ⬆️
internal/querynodev2/handlers.go 79.93% <100.00%> (-0.31%) ⬇️
internal/proxy/task_search.go 77.41% <91.66%> (+0.46%) ⬆️
internal/querynodev2/segments/result.go 71.95% <80.00%> (+4.84%) ⬆️
internal/querynodev2/segments/search_reduce.go 86.66% <90.00%> (-1.66%) ⬇️
internal/util/reduce/reduce_info.go 95.00% <95.00%> (ø)
... and 2 more

... and 232 files with indirect coverage changes

MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this pull request Sep 5, 2024
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the support-hybrid-search-groupby-review branch from 3249e4d to 2ecb24a Compare September 5, 2024 08:13
@mergify mergify bot removed the ci-passed label Sep 5, 2024
offset int64
queryInfo *planpb.QueryInfo
func reduceSearchResult(ctx context.Context, subSearchResultData []*schemapb.SearchResultData, reduceInfo *reduce.ResultInfo) (*milvuspb.SearchResults, error) {
if reduceInfo.GetGroupByFieldId() > 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note here

}
}
// reducing nq * topk results
for nqIdx := int64(0); nqIdx < nq; nqIdx++ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when reducing for advance groupby, we just merge results from different delegatror and do not really reduce and sort

}
return false
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different score strategy is used here to prepare different scorer

return nil
}

func rankSearchResultDataByPk(ctx context.Context,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for normal hybridSearch, this rank branch is executed

@@ -375,12 +381,7 @@ func (sd *shardDelegator) Search(ctx context.Context, req *querypb.SearchRequest
}
results[i] = result
}
var ret *internalpb.SearchResults
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed useless reduce here, and return directly for reduction outside in searchChannel

result, err2 = segments.ReduceSearchResults(ctx, toReduceResults, segments.NewReduceInfo(req.Req.GetNq(),
req.Req.GetTopk(),
req.Req.GetExtraSearchParam().GetGroupByFieldId(),
req.Req.GetExtraSearchParam().GetGroupSize(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently in milvus query framework, only one channel will be searched, queried in one request, which is triggered by proxy, So I simplify code here and remove unnessary reduction between different channels which will never happen at all

@MrPresent-Han
Copy link
Contributor Author

rerun ut

@mergify mergify bot added the ci-passed label Sep 5, 2024
groupByFieldId int64
groupSize int64
isAdvance bool
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add motify options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

MrPresent-Han pushed a commit to MrPresent-Han/milvus that referenced this pull request Sep 6, 2024
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the support-hybrid-search-groupby-review branch from 2ecb24a to acf0c3f Compare September 6, 2024 04:03
@mergify mergify bot removed the ci-passed label Sep 6, 2024
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the support-hybrid-search-groupby-review branch from acf0c3f to 4b86fe6 Compare September 6, 2024 06:03
@mergify mergify bot added the ci-passed label Sep 6, 2024
@@ -403,6 +404,11 @@ func (t *searchTask) initAdvancedSearchRequest(ctx context.Context) error {
zap.Int64s("plan.OutputFieldIds", plan.GetOutputFieldIds()),
zap.Stringer("plan", plan)) // may be very large if large term passed.
}
if len(t.queryInfos) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to do parse from rankParams

@czs007
Copy link
Contributor

czs007 commented Sep 8, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, MrPresent-Han

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@czs007 czs007 changed the title supporing hybrid search group_by feat: supporing hybrid search group_by Sep 8, 2024
@mergify mergify bot added kind/feature Issues related to feature request from users and removed ci-passed do-not-merge/invalid-pr-format labels Sep 8, 2024
@sre-ci-robot sre-ci-robot merged commit e480b10 into milvus-io:master Sep 8, 2024
13 of 16 checks passed
chyezh pushed a commit to chyezh/milvus that referenced this pull request Sep 11, 2024
related: milvus-io#35096

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/internal-api dco-passed DCO check passed. kind/feature Issues related to feature request from users lgtm size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants