Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searcher: make indexing of repos concurrent #272

Merged
merged 6 commits into from
Jan 18, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions searcher/searcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,15 @@ type Searcher struct {
doneCh chan empty
}

// Struct used to send the results from newSearcherConcurrent function.
// This struct can either have a non-nil searcher or a non-nil error
// depending on what newSearcher function returns.
type searcherResult struct {
name string
searcher *Searcher
err error
}

type empty struct{}
type limiter chan bool

Expand Down Expand Up @@ -277,15 +286,24 @@ func MakeAll(cfg *config.Config) (map[string]*Searcher, map[string]error, error)

lim := makeLimiter(cfg.MaxConcurrentIndexers)

// Channel to receive the results from newSearcherConcurrent function.
resultCh := make(chan searcherResult)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgryski IMO, making this channel unbuffered won't solve the problem of blocking writes from goroutines. How is unbuffered channel better than buffered channel of length 1?

Buffered channel of length len(cfg.Repos), however, will make sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant more that having a buffer size of 1 didn't make sense. Either make it totally unbuffered (so they all block), or with enough space for them all to put the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, since we aren't doing any heavy work after receiving on the channel, I feel it's okay to make the goroutines blocking on sending to the channel. What do you think? Do you have a preference?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard practice for launching a set of goroutines with a response channel is to have it buffered with the number of known entries so that none of them block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll use the standard practice then. Thanks 🙂


// Start new searchers for all repos in different go routines while
// respecting cfg.MaxConcurrentIndexers.
for name, repo := range cfg.Repos {
s, err := newSearcher(cfg.DbPath, name, repo, refs, lim)
if err != nil {
log.Print(err)
errs[name] = err
go newSearcherConcurrent(cfg.DbPath, name, repo, refs, lim, resultCh)
}

// Collect the results on resultCh channel for all repos.
for range cfg.Repos {
r := <-resultCh
if r.err != nil {
log.Print(r.err)
errs[r.name] = r.err
continue
}

searchers[name] = s
searchers[r.name] = r.searcher
}

if err := refs.removeUnclaimed(); err != nil {
Expand Down Expand Up @@ -464,3 +482,34 @@ func newSearcher(

return s, nil
}

// This function is a wrapper around `newSearcher` function.
// It respects the parameter `cfg.MaxConcurrentIndexers` while making the
// creation of searchers for various repositories concurrent.
func newSearcherConcurrent(
dbpath, name string,
repo *config.Repo,
refs *foundRefs,
lim limiter,
resultCh chan searcherResult) {

// acquire a token from the rate limiter
lim.Acquire()
defer lim.Release()

s, err := newSearcher(dbpath, name, repo, refs, lim)
if err != nil {
resultCh <- searcherResult{
name: name,
searcher: nil,
err: err,
}
return
}

resultCh <- searcherResult{
name: name,
searcher: s,
err: nil,
}
}