Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance for small files #110

Closed
jayqi opened this issue Nov 20, 2020 · 0 comments · Fixed by #111
Closed

Slow performance for small files #110

jayqi opened this issue Nov 20, 2020 · 0 comments · Fixed by #111

Comments

@jayqi
Copy link
Member

jayqi commented Nov 20, 2020

Currently, _refresh_cache is called as part of a lot of methods to ensure the local cache is up-to-date.

Unfortunately, _refresh_cache makes a whole bunch of network requests to fetch metadata from the cloud storage service. In fact, it looks like a run through _refresh_cache that doesn't download a new copy still needs to make 4 of these requests, while a run through that does download makes a total of 6 requests. On my current internet connection, one of these requests is 350-400 ms. This means cloud paths may take a minimum of 1-2 sec for doing any method that hits _refresh_cache for a file that exists in cloud storage, no matter how small the file is.

Screen Shot 2020-11-19 at 5 04 49 PM

def _refresh_cache(self, force_overwrite_from_cloud=False):
# nothing to cache if the file does not exist; happens when creating
# new files that will be uploaded
if not self.exists():
return
if self.is_dir():
raise ValueError("Only individual files can be cached")
# if not exist or cloud newer
if (
not self._local.exists()
or (self._local.stat().st_mtime < self.stat().st_mtime)
or force_overwrite_from_cloud
):
# ensure there is a home for the file
self._local.parent.mkdir(parents=True, exist_ok=True)
self.download_to(self._local)
# force cache time to match cloud times
os.utime(self._local, times=(self.stat().st_mtime, self.stat().st_mtime))
if self._dirty:
raise OverwriteDirtyFile(
f"Local file ({self._local}) for cloud path ({self}) has been changed by your code, but "
f"is being requested for download from cloud. Either (1) push your changes to the cloud, "
f"(2) remove the local file, or (3) pass `force_overwrite_from_cloud=True` to "
f"overwrite."
)
# if local newer but not dirty, it was updated
# by a separate process; do not overwrite unless forced to
if self._local.stat().st_mtime > self.stat().st_mtime:
raise OverwriteNewerLocal(
f"Local file ({self._local}) for cloud path ({self}) is newer on disk, but "
f"is being requested for download from cloud. Either (1) push your changes to the cloud, "
f"(2) remove the local file, or (3) pass `force_overwrite_from_cloud=True` to "
f"overwrite."
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants