Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOCAL: Pull requests list Speed up non recursive list_dir for google cloud storage #321

Merged
merged 3 commits into from
Feb 16, 2023

Conversation

pjbull
Copy link
Member

@pjbull pjbull commented Feb 16, 2023

Local branch for running live tests on #318.


The non-recursive implementation of list_dir for google cloud was slow.
The issue was already stated in the code that we just filtered out the top-level files and dirs out of al the blobs.

This is fixed by setting the delimiter='/' when listing a directory non-recursive.

Perf results

Before

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Test Name      ┃ Config Name                ┃ Iterations ┃           Mean ┃              Std ┃            Max ┃ N Items ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ List Folders   │ List shallow recursive     │         10 │ 0:00:01.754618 │ ± 0:00:00.043073 │ 0:00:01.854687 │   5,500 │
│ List Folders   │ List shallow non-recursive │         10 │ 0:00:01.839951 │ ± 0:00:00.146716 │ 0:00:02.194600 │   5,500 │
│ List Folders   │ List normal recursive      │         10 │ 0:00:02.670551 │ ± 0:00:00.142644 │ 0:00:02.984851 │   7,877 │
│ List Folders   │ List normal non-recursive  │         10 │ 0:00:02.568673 │ ± 0:00:00.037830 │ 0:00:02.609728 │     113 │
│ List Folders   │ List deep recursive        │         10 │ 0:00:06.076180 │ ± 0:00:00.893733 │ 0:00:07.018396 │   7,955 │
│ List Folders   │ List deep non-recursive    │         10 │ 0:00:05.576774 │ ± 0:00:00.790192 │ 0:00:07.463698 │      31 │
│ Glob scenarios │ Glob shallow recursive     │         10 │ 0:00:01.943264 │ ± 0:00:00.073614 │ 0:00:02.100214 │   5,500 │
│ Glob scenarios │ Glob shallow non-recursive │         10 │ 0:00:02.041357 │ ± 0:00:00.409244 │ 0:00:03.196892 │   5,500 │
│ Glob scenarios │ Glob normal recursive      │         10 │ 0:00:02.708558 │ ± 0:00:00.059786 │ 0:00:02.772853 │   7,272 │
│ Glob scenarios │ Glob normal non-recursive  │         10 │ 0:00:02.355530 │ ± 0:00:00.033168 │ 0:00:02.397684 │      12 │
│ Glob scenarios │ Glob deep recursive        │         10 │ 0:00:08.697164 │ ± 0:00:02.117151 │ 0:00:13.550607 │   7,650 │
│ Glob scenarios │ Glob deep non-recursive    │         10 │ 0:00:10.489393 │ ± 0:00:02.967771 │ 0:00:15.646034 │      25 │
└────────────────┴────────────────────────────┴────────────┴────────────────┴──────────────────┴────────────────┴─────────┘

After

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Test Name      ┃ Config Name                ┃ Iterations ┃           Mean ┃              Std ┃            Max ┃ N Items ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ List Folders   │ List shallow recursive     │         10 │ 0:00:01.782473 │ ± 0:00:00.079213 │ 0:00:01.978639 │   5,500 │
│ List Folders   │ List shallow non-recursive │         10 │ 0:00:01.658828 │ ± 0:00:00.055595 │ 0:00:01.796994 │   5,500 │
│ List Folders   │ List normal recursive      │         10 │ 0:00:02.438342 │ ± 0:00:00.086166 │ 0:00:02.603272 │   7,877 │
│ List Folders   │ List normal non-recursive  │         10 │ 0:00:00.172392 │ ± 0:00:00.008785 │ 0:00:00.191022 │     113 │
│ List Folders   │ List deep recursive        │         10 │ 0:00:08.002241 │ ± 0:00:01.882228 │ 0:00:10.528643 │   7,955 │
│ List Folders   │ List deep non-recursive    │         10 │ 0:00:00.168641 │ ± 0:00:00.007435 │ 0:00:00.179845 │      31 │
│ Glob scenarios │ Glob shallow recursive     │         10 │ 0:00:01.811696 │ ± 0:00:00.034521 │ 0:00:01.851655 │   5,500 │
│ Glob scenarios │ Glob shallow non-recursive │         10 │ 0:00:01.825440 │ ± 0:00:00.080909 │ 0:00:01.984854 │   5,500 │
│ Glob scenarios │ Glob normal recursive      │         10 │ 0:00:02.665863 │ ± 0:00:00.068024 │ 0:00:02.777973 │   7,272 │
│ Glob scenarios │ Glob normal non-recursive  │         10 │ 0:00:00.172190 │ ± 0:00:00.006265 │ 0:00:00.189009 │      12 │
│ Glob scenarios │ Glob deep recursive        │         10 │ 0:00:07.589083 │ ± 0:00:01.563928 │ 0:00:11.491544 │   7,650 │
│ Glob scenarios │ Glob deep non-recursive    │         10 │ 0:00:00.169732 │ ± 0:00:00.006388 │ 0:00:00.179240 │      25 │
└────────────────┴────────────────────────────┴────────────┴────────────────┴──────────────────┴────────────────┴─────────┘

vlaminckaxel and others added 2 commits February 15, 2023 15:53
* use correct env vars for gs, azure bucket perf tests

* speedup non recursive _list_dir for gs

* linting

* make gs mock compatible with delimiter

* linting
@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2023

@pjbull pjbull marked this pull request as ready for review February 16, 2023 01:07
@pjbull pjbull merged commit 874044a into master Feb 16, 2023
@pjbull pjbull deleted the 318-local-gs-speedup branch February 16, 2023 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants