-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make segment merging parallel in SegmentMerger [LUCENE-8580] #9626
Comments
Dawid Weiss (@dweiss) (migrated from JIRA) About time, eh? |
Vigya Sharma (@vigyasharma) (migrated from JIRA) I started looking into this, and had some early questions... What would be the right benchmark to look at, for measuring any improvements here? Would it be the Indexing Throughput benchmark? Should we add some benchmark specifically for merge related performance, like triggering a force merge? I suppose it wont be a measure of qps, but rather, the time to complete a merge on a given index? Do we have such benchmarks already? Apologies if these are obvious things, I've only recently started looking at benchmarks. Will update here if I'm able to figure something out about these myself.. |
Vigya Sharma (@vigyasharma) (migrated from JIRA) I'm thinking of tackling this one data structure at a time, starting with postings. Extending on the @dweiss's patch, I was wondering if we could merge each For the fields part first, would it work to just spawn off multiple TermsIndex has an FST per field and I see we go field by field while writing terms and merging their postings, which makes it look viable. I'm not yet clear on whether this will require writing to multiple (tim/tip/tmd/others?) files, and if that is a problem (whether we need to stitch them together which drains away all our concurrency gains, or if we can explore ways to work with multiple files). I'll explore more on how the files get written. Want to check with the community for any pointers, if I'm on the right track here, and if there are some obvious wrinkles I should look at. |
Michael Sokolov (@msokolov) (migrated from JIRA) It seems we have generally stuck with one or a handful of files per format. Presumably we don't want to fragment into many small files in order to enable the OS to handle I/O more optimally? I think the key is not to impact search performance. But maybe start by writing many files in parallel just to see what is the best we can achieve for parallel merging with this approach and worry about connecting them together later? It shouldn't be hard to do the stitching, just block copies + rewriting the metadata? |
…ngle merge action (#13124) This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is the ConcurrentMergeScheduler (CMS). In particular, the CMS will limit the amount of parallelism based on the current number of mergeThreads and the configured maxThread options. Parallelism is now added by default to the SegmentMerger where each merging task is executed in parallel with the others. Additionally, the Lucene99 HNSW codec will use the provided MergeScheduler executor if no executor is provided directly. Relates to: #12740 Relates to: #9626
…ngle merge action (#13124) This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is the ConcurrentMergeScheduler (CMS). In particular, the CMS will limit the amount of parallelism based on the current number of mergeThreads and the configured maxThread options. Parallelism is now added by default to the SegmentMerger where each merging task is executed in parallel with the others. Additionally, the Lucene99 HNSW codec will use the provided MergeScheduler executor if no executor is provided directly. Relates to: #12740 Relates to: #9626
…ngle merge action (#13190) This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is the ConcurrentMergeScheduler (CMS). In particular, the CMS will limit the amount of parallelism based on the current number of mergeThreads and the configured maxThread options. Parallelism is now added by default to the SegmentMerger where each merging task is executed in parallel with the others. Additionally, the Lucene99 HNSW codec will use the provided MergeScheduler executor if no executor is provided directly. Relates to: #12740 Relates to: #9626 This is a take 2 of: #13124
…ngle merge action (#13190) This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is the ConcurrentMergeScheduler (CMS). In particular, the CMS will limit the amount of parallelism based on the current number of mergeThreads and the configured maxThread options. Parallelism is now added by default to the SegmentMerger where each merging task is executed in parallel with the others. Additionally, the Lucene99 HNSW codec will use the provided MergeScheduler executor if no executor is provided directly. Relates to: #12740 Relates to: #9626 This is a take 2 of: #13124
…ngle merge action (apache#13190) This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is the ConcurrentMergeScheduler (CMS). In particular, the CMS will limit the amount of parallelism based on the current number of mergeThreads and the configured maxThread options. Parallelism is now added by default to the SegmentMerger where each merging task is executed in parallel with the others. Additionally, the Lucene99 HNSW codec will use the provided MergeScheduler executor if no executor is provided directly. Relates to: apache#12740 Relates to: apache#9626 This is a take 2 of: apache#13124
A placeholder issue stemming from the discussion on the mailing list [1]. Not of any high priority.
At the moment any merging from N segments into one will happen sequentially for each data structure involved in a segment (postings, norms, points, etc.). If the input segments are large, the CPU (and I/O) are mostly unused and the process takes a long time.
Merging of these data structures is mostly independent of each other, so it'd be interesting to see if we can speed things up by allowing them to run concurrently. I investigated this on a 40GB index with 22 segments, force-merging this into 1 segment (of similar size). Quick and dirty patch attached.
I see some improvement, although it's not by much; the largest component dominates everything else.
Results from an 8-core CPU.
Before:
After:
Ideally, one would need to push forkjoin into individual subroutines so that, for example, postings utilize concurrency when merging (pulling blocks of terms concurrently from the input, calculating statistics, etc. and then pushing in an ordered fashion to the codec).
[1] https://markmail.org/thread/dtejwq42qagykeac
Migrated from LUCENE-8580 by Dawid Weiss (@dweiss), updated Apr 26 2022
Attachments: LUCENE-8580.patch
The text was updated successfully, but these errors were encountered: