Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to tune project listing query again #11620

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions readthedocs/projects/querysets.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Project model QuerySet classes."""
from django.db import models
from django.db.models import Count, OuterRef, Prefetch, Q, Subquery
from django.db.models import Count, Max, Prefetch, Q

from readthedocs.core.permissions import AdminPermission
from readthedocs.core.querysets import NoReprQuerySet
Expand Down Expand Up @@ -131,27 +131,31 @@ def prefetch_latest_build(self):
"""
from readthedocs.builds.models import Build

# Prefetch the latest build for each project.
subquery_build_latest = Subquery(
Build.internal.filter(project=OuterRef("project_id"))
.order_by("-date")
.values_list("id", flat=True)[:1]
# Get most recent and recent successful builds
builds_latest = (
Build.internal.filter(project__in=self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear why Build.internal is needed in both the inner query and the prefetch query. It seems like one of them could be Build.objects at very least? The performance is a little better with Build.objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand is to avoid getting builds from PRs (external versions) and I'd say that .internal should perform better than .objects since it removes a lot of builds to consider and they should be removed using an index Build.type. That's the theory, tho 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh same, that was my thought initially. There is some complexity added by this method though, which I think ultimately annoys the query planner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of __in can't we just use project__pk=self.pk here as I did in another PR? That worked pretty good there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self is a Project.objects queryset, not an individual model instance.

.values("project")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this line here? I understand the project value is not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not, but this is for grouping by project. The latest build day per project is what is needed here. If there is a different way to group this, we don't need the second query at all.

.annotate(latest=Max("pk"))
.values_list("latest", flat=True)
)
builds_success = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it could be combined into the query above, saving ~400ms.

Build.internal.filter(project__in=self, success=True)
.values("project")
.annotate(latest=Max("pk"))
.values_list("latest", flat=True)
)

# Prefetch the latest build for each project.
prefetch_build_latest = Prefetch(
"builds",
Build.internal.filter(pk__in=subquery_build_latest),
Build.internal.filter(pk__in=builds_latest),
to_attr=self.model.LATEST_BUILD_CACHE,
)

# Prefetch the latest successful build for each project.
subquery_build_successful = Subquery(
Build.internal.filter(project=OuterRef("project_id"))
.order_by("-date")
.values_list("id", flat=True)[:1]
)
prefetch_build_successful = Prefetch(
"builds",
Build.internal.filter(pk__in=subquery_build_successful),
Build.internal.filter(pk__in=builds_success),
to_attr=self.model.LATEST_SUCCESSFUL_BUILD_CACHE,
)

Expand Down