-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cached-path file utils #74
Conversation
@epwalsh thank you for the suggestions. I believe the rust-bert/src/common/resources.rs Line 186 in a52279e
cached_path . Could you please use the existing file and methods in resources instead of creating a separate file_utils file?
For now I'd like to keep the concept of local and remote resources - and I believe this would still work with the proposed changes. |
Cargo.toml
Outdated
@@ -37,6 +37,8 @@ serde = { version = "1.0.114", features = ["derive"] } | |||
dirs = "3.0.1" | |||
itertools = "0.9.0" | |||
ordered-float = "2.0.0" | |||
cached-path = "0.4.0-rc1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please ensure a release is added as a dependency for the final version of this PR?
src/common/file_utils.rs
Outdated
@@ -0,0 +1,29 @@ | |||
use crate::common::error::RustBertError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the existing resources
file for these utilities
src/common/file_utils.rs
Outdated
.unwrap(); | ||
} | ||
|
||
fn _get_cache_directory() -> PathBuf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already exists at
rust-bert/src/common/resources.rs
Line 150 in a52279e
fn _get_cache_directory() -> PathBuf { |
src/common/file_utils.rs
Outdated
home | ||
} | ||
|
||
pub fn cached_path(resource: &str) -> Result<PathBuf, RustBertError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please wrap the cached path functionality directly in
rust-bert/src/common/resources.rs
Line 186 in a52279e
pub fn download_resource(resource: &Resource) -> Result<&PathBuf, RustBertError> { |
Hey @guillaume-be, no problem. One issue though is that, like I think I could make Update: Actually I think it's inherently unsafe to allow for custom arbitrary target paths since this would allow for race conditions. Maybe |
What if the target_path would point to a directory instead (i.e. remove the file name). impl BartModelResources {
/// Shared under MIT license by the Facebook AI Research Fairseq team at https://github.com/pytorch/fairseq. Modified with conversion to C-array format.
pub const BART: (&'static str, &'static str) = (
"bart/model.ot",
"https://cdn.huggingface.co/facebook/bart-large/rust_model.ot",
);
} Would for example become: impl BartModelResources {
/// Shared under MIT license by the Facebook AI Research Fairseq team at https://github.com/pytorch/fairseq. Modified with conversion to C-array format.
pub const BART: (&'static str, &'static str) = (
"bart",
"https://cdn.huggingface.co/facebook/bart-large/rust_model.ot",
);
} and the |
That seems reasonable |
Just made some updates. CI will probably fail. Let me know if you're happy with the API and then I'll fix the compile errors. IMHO though it doesn't make sense to have a Let me know. |
src/common/resources.rs
Outdated
/// Gets the local path for a given resource | ||
/// Gets the local path for a given resource. | ||
/// | ||
/// If the resource is a remote resource, it is downlaoded and cached. Then the path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
@epwalsh Thank you for the change. The
Since fn from_file<P: AsRef<Path>>(path: P) -> T {
let f = File::open(path).expect("Could not open configuration file.");
let br = BufReader::new(f);
let config: T = serde_json::from_reader(br).expect("could not parse configuration");
config
} |
@epwalsh Note that I am unable to build the |
I'll add a Windows build to |
Okay, should build on Windows now 👌 |
Still TODO:
|
@epwalsh This looks great thanks. Actually regarding the first TODO - one could potentially leave it the way it is. As you can see in the CI, one nice effect is that the target file name becomes the name of the folder where the cached version will be stored. This is quite nice as the hashed version makes it difficult to differentiate a What do you think? |
I'm fine with that if you are 👍 |
src/common/resources.rs
Outdated
Resource::Local(_) => Ok(resource.get_local_path()), | ||
} | ||
pub fn download_resource(resource: &Resource) -> Result<PathBuf, RustBertError> { | ||
Ok(resource.get_local_path()?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nitpicking, but I believe you could return resource.get_local_path()
directly as this is a Result<PathBuf, RustBertError>
@guillaume-be I noticed there were bunch of warnings (unrelated to this PR) when I ran clippy: cargo clippy --all-targets --all-features -- -D warnings Would you want to add a clippy check to the CI steps? And maybe documentation tests as well? And I'm not sure if travis CI supports caching, but I've been able to get a huge speed-up in CI by using caching in GitHub Actions. |
@epwalsh thank you very much for the PR! I did notice the clippy warnings, I am working on fixing them and they will be addressed in the next release. I will then add a clippy check to the CI. The documentation tests are already run and check as part of the CI. Regarding caching, I had it enabled at the beginning for this project, but the time required to compress and decompress the large cache was actually longer than downloading the files again. |
Oh right. By “large files” are you talking about model weights / checkpoint files? I just meant caching the cargo build artifacts. That can make a big difference especially for Clippy, which takes a while to run from scratch every time |
closes #72
So far I've just added a
common/file_utils.rs
module. The idea being that this would replace what's currently incommon/resources.rs
.My question for you is, do you want to keep the
LocalResource
andRemoteResource
abstractions?