Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

own: index recent contributors for ownership signals #51188

Merged
merged 34 commits into from
May 3, 2023

Conversation

coury-clark
Copy link
Contributor

@coury-clark coury-clark commented Apr 26, 2023

This PR hooks up the recent contributors indexer with the background job, effectively enabling indexing once the feature flag is switched on.

Test plan

You can start Sourcegraph and let the jobs auto-index, or you can manually control the queue through the own_background_jobs table.

Here is some example output auto-indexing the local dev repos:

[worker] INFO worker.own-repo-indexing-queue.own.background.index.scheduler.recent-contributors background/scheduler.go:84 Scheduling repo indexes for own job {"job-name": "recent-contributors"}
[worker] INFO worker.own-repo-indexing-queue.own.background.index.scheduler.recent-contributors background/scheduler.go:96 Own index job scheduled {"job-name": "recent-contributors", "row-count": 11}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"TraceId": "1e151b9be9a6cb308ab77719c6ad54c0", "SpanId": "e738c4f0d1af8d29", "name": "own_background_worker", "id": 123}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 124}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"TraceId": "4a857e335aa28ba2717c951cfff07ba5", "SpanId": "04e3ff4d85cb72bc", "name": "own_background_worker", "id": 125}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 126}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"TraceId": "e1b51f47de0dcbfdebd14bfc346f60a2", "SpanId": "0384937856ac05a4", "name": "own_background_worker", "id": 127}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"TraceId": "e1b51f47de0dcbfdebd14bfc346f60a2", "SpanId": "4c39331206218a30", "handle": {"count": 0, "repo_id": 6}}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"TraceId": "4a857e335aa28ba2717c951cfff07ba5", "SpanId": "f7bbf2aa07a3aef2", "handle": {"count": 0, "repo_id": 5}}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 128}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 129}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"handle": {"count": 0, "repo_id": 4}}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 130}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"handle": {"count": 0, "repo_id": 8}}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"handle": {"count": 1, "repo_id": 1}}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 131}
[worker] INFO worker.own-repo-indexing-queue workerutil/worker.go:326 Dequeued record for processing {"name": "own_background_worker", "id": 132}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"TraceId": "1e151b9be9a6cb308ab77719c6ad54c0", "SpanId": "b3615fd2c3989b98", "handle": {"count": 3, "repo_id": 11}}
[worker] INFO worker.own-repo-indexing-queue.Handle background/recent_contributors.go:67 commits inserted {"handle": {"count": 0, "repo_id": 9}}

@cla-bot cla-bot bot added the cla-signed label Apr 26, 2023
@coury-clark coury-clark self-assigned this May 1, 2023
@coury-clark coury-clark requested a review from cbart May 1, 2023 22:22
"github.com/sourcegraph/sourcegraph/internal/types"
)

func Test_RecentContributorIndexFromGitserver(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is largely the same test from the store, but feeding a mock gitserver through to the DB. It effectively tests the entire flow, except the queue.

@@ -324,6 +324,63 @@ func (c *clientImplementor) CommitGraph(ctx context.Context, repo api.RepoName,
return gitdomain.ParseCommitGraph(strings.Split(string(out), "\n")), nil
}

func (c *clientImplementor) CommitLog(ctx context.Context, repo api.RepoName, after time.Time) ([]CommitLog, error) {
args := []string{"log", "--pretty=format:%H<!>%ae<!>%an<!>%ad", "--name-only", "--topo-order", "--no-merges"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added no merges, does that match your thinking @cbart?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense to me

@coury-clark coury-clark changed the title cclark/own/wire up indexer own: index recent contributors for ownership signals May 1, 2023
@coury-clark coury-clark marked this pull request as ready for review May 1, 2023 23:24
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented May 1, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 931d6ad...afdd075.

Notify File(s)
@indradhanush internal/gitserver/client.go
internal/gitserver/commands.go
internal/gitserver/commands_test.go
internal/gitserver/mocks_temp.go
@sashaostrikov internal/gitserver/client.go
internal/gitserver/commands.go
internal/gitserver/commands_test.go
internal/gitserver/mocks_temp.go

Copy link
Contributor

@sashaostrikov sashaostrikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I left some comments, I guess the main thing is to add some test coverage for a new gitserver command

enterprise/internal/own/background/background.go Outdated Show resolved Hide resolved
enterprise/internal/own/background/background.go Outdated Show resolved Hide resolved
}

func (h *handler) Handle(ctx context.Context, logger log.Logger, record *Job) error {
func (h *handler) Handle(ctx context.Context, lgr log.Logger, record *Job) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have a bit of inconsistent logger naming, in structs it is logger, in function parameters I can see lgr, should we unify it?

I guess lgr is to fix the prefix of logger clashing with the imported log package? At least this is what Goland usually complains to me about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when you have local variables that shadow the package name. It's annoying because sometimes we have the import alias log and other times logger, so I've been trying something new with lgr to always have a consistent import logger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, makes sense

internal/database/recent_contribution_signal.go Outdated Show resolved Hide resolved
@@ -324,6 +324,63 @@ func (c *clientImplementor) CommitGraph(ctx context.Context, repo api.RepoName,
return gitdomain.ParseCommitGraph(strings.Split(string(out), "\n")), nil
}

func (c *clientImplementor) CommitLog(ctx context.Context, repo api.RepoName, after time.Time) ([]CommitLog, error) {
args := []string{"log", "--pretty=format:%H<!>%ae<!>%an<!>%ad", "--name-only", "--topo-order", "--no-merges"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense to me

internal/gitserver/commands.go Show resolved Hide resolved
internal/gitserver/commands.go Outdated Show resolved Hide resolved
internal/gitserver/commands.go Outdated Show resolved Hide resolved
internal/gitserver/commands.go Show resolved Hide resolved
@@ -324,6 +324,63 @@ func (c *clientImplementor) CommitGraph(ctx context.Context, repo api.RepoName,
return gitdomain.ParseCommitGraph(strings.Split(string(out), "\n")), nil
}

func (c *clientImplementor) CommitLog(ctx context.Context, repo api.RepoName, after time.Time) ([]CommitLog, error) {
args := []string{"log", "--pretty=format:%H<!>%ae<!>%an<!>%ad", "--name-only", "--topo-order", "--no-merges"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by review

Have we considered benchmarking this function to see how this would perform for monorepos with 500k to 1M commits? Asking because we recently saw a bunch of customer issues where load was the primary bottleneck and it seems like this command can be pretty open ended and in the worst case list all the repo's commits?

I'm trying to anticipate problems that could occur in the future but if you all don't see an issue here - feel free to ignore.

Also the benchmarking (if it makes sense) is not a blocking review for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@indradhanush yep, definitely something we are going to test once this can go onto a customer-like instance. Most likely we will end up paging this anyway so we aren't always running the log from HEAD (this is how Code Insights works as well, we log from older commits to get reference hashes)

coury-clark and others added 4 commits May 2, 2023 10:42
Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>
Co-authored-by: Alex Ostrikov <alex.ostrikov@sourcegraph.com>
@@ -3206,3 +3208,54 @@ func TestBlameHunkReader(t *testing.T) {
}
})
}

func Test_CommitLog(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀🚀🚀

Copy link
Contributor

@sashaostrikov sashaostrikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit::shipit::shipit:

@coury-clark coury-clark merged commit c08e946 into main May 3, 2023
@coury-clark coury-clark deleted the cclark/own/wire-up-indexer branch May 3, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants