From af5f4bda670c90d7f57948f6117aa597326738c3 Mon Sep 17 00:00:00 2001 From: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Mon, 24 Jun 2024 17:08:45 +0200 Subject: [PATCH] Update repositories-recommendations.md for big storage requests (#1314) * Update repositories-recommendations.md for big storage requests * Update docs/hub/repositories-recommendations.md Co-authored-by: Julien Chaumond --------- Co-authored-by: Julien Chaumond --- docs/hub/repositories-recommendations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/repositories-recommendations.md b/docs/hub/repositories-recommendations.md index 310e0429d..e43490aaf 100644 --- a/docs/hub/repositories-recommendations.md +++ b/docs/hub/repositories-recommendations.md @@ -31,7 +31,7 @@ Under the hood, the Hub uses Git to version the data, which has structural impli If your repo is crossing some of the numbers mentioned in the previous section, **we strongly encourage you to check out [`git-sizer`](https://github.com/github/git-sizer)**, which has very detailed documentation about the different factors that will impact your experience. Here is a TL;DR of factors to consider: -- **Repository size**: The total size of the data you're planning to upload. There is no hard limit on a Hub repository size. However, if you plan to upload hundreds of GBs or even TBs of data, we would appreciate it if you could let us know in advance so we can better help you if you have any questions during the process. You can contact us at datasets@huggingface.co or on [our Discord](http://hf.co/join/discord). +- **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. Please provide details of your project. You can contact us at datasets@huggingface.co or on [our Discord](http://hf.co/join/discord). - **Number of files**: - For optimal experience, we recommend keeping the total number of files under 100k. Try merging the data into fewer files if you have more. For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format.