-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further reduce the number of alls to head for cached objects #18871
Conversation
The documentation is not available anymore as the PR was closed or merged. |
and also cc @Wauplin :) |
src/transformers/utils/hub.py
Outdated
@@ -244,6 +244,9 @@ def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_h | |||
with open(os.path.join(model_cache, "refs", revision)) as f: | |||
commit_hash = f.read() | |||
|
|||
if os.path.isfile(os.path.join(model_cache, ".no_exist", commit_hash, filename)): | |||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily the cleanest here, but I need a return type that is not None
(means file not found in cache) and not a string (I could put "no_exist"
but I'm sure someone will end up naming a cached file like this just to spite me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem with this as long as there's a comment explaining why!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could put "no_exist" but I'm sure someone will end up naming a cached file like this just to spite me
For sure 😁
For such a case, -1
is fine but for a more explicit return you can also create an empty object NO_EXIST
and use it as a return value:
# src/transformers/utils/hub.py
# Return value when trying to load a file from cache but the file does not exist.
NO_EXIST = object() # or "_NO_EXIST"
(...)
def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_hash=None):
(...)
if os.path.isfile(os.path.join(model_cache, ".no_exist", commit_hash, filename)):
return NO_EXIST
(...)
def cached_file(...):
(...)
if resolved_file is not NO_EXIST:
return resolved_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case I agree with @LysandreJik to document it, especially the difference between "a file not existing in the cache (e.g. not cached)" and "a file not existing at all".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed as suggested, let me know if you have more comments on the new version!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed it and it looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
It would be great to have @Wauplin's review before merging
src/transformers/utils/hub.py
Outdated
@@ -244,6 +244,9 @@ def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_h | |||
with open(os.path.join(model_cache, "refs", revision)) as f: | |||
commit_hash = f.read() | |||
|
|||
if os.path.isfile(os.path.join(model_cache, ".no_exist", commit_hash, filename)): | |||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem with this as long as there's a comment explaining why!
src/transformers/utils/hub.py
Outdated
@@ -244,6 +244,9 @@ def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_h | |||
with open(os.path.join(model_cache, "refs", revision)) as f: | |||
commit_hash = f.read() | |||
|
|||
if os.path.isfile(os.path.join(model_cache, ".no_exist", commit_hash, filename)): | |||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could put "no_exist" but I'm sure someone will end up naming a cached file like this just to spite me
For sure 😁
For such a case, -1
is fine but for a more explicit return you can also create an empty object NO_EXIST
and use it as a return value:
# src/transformers/utils/hub.py
# Return value when trying to load a file from cache but the file does not exist.
NO_EXIST = object() # or "_NO_EXIST"
(...)
def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_hash=None):
(...)
if os.path.isfile(os.path.join(model_cache, ".no_exist", commit_hash, filename)):
return NO_EXIST
(...)
def cached_file(...):
(...)
if resolved_file is not NO_EXIST:
return resolved_file
src/transformers/utils/hub.py
Outdated
@@ -244,6 +244,9 @@ def try_to_load_from_cache(cache_dir, repo_id, filename, revision=None, commit_h | |||
with open(os.path.join(model_cache, "refs", revision)) as f: | |||
commit_hash = f.read() | |||
|
|||
if os.path.isfile(os.path.join(model_cache, ".no_exist", commit_hash, filename)): | |||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case I agree with @LysandreJik to document it, especially the difference between "a file not existing in the cache (e.g. not cached)" and "a file not existing at all".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
In general this check could live in huggingface_hub
as well. I created an issue to track it: huggingface/huggingface_hub#1033.
Yes, my plan was to port this to Thanks for the reviews, will address comments later this morning. |
…face#18871) * Further reduce the number of alls to head for cached models/tokenizers/pipelines * Fix tests * Address review comments
What does this PR do?
This PR completes #18534 and leverages the cache system of files that do not exist at a given commit in a repo introduced in the last release of
huggingface_hub
(by this PR) to further reduce the numbers of calls to the API when trying to load configurations/models/tokenizers/pipelines to just 1 call every time the object is cached and the current commit is the same one as the distant repo for the given revision.cc @Narsil