Add jupyter notebook tutorial for single node mulilingual dataset #30

nicoleeeluo · 2024-04-11T13:05:45Z

This PR adds a jupyter notebook workflow for a sample curation pipeline for Thai Wikipedia data.

Modules included in this workflows are

wikipedia downloading
language separation
unicode formatter
add id helper
GPU exact deduplication
GPU fuzzy deduplication
heuristic filtering for non-en dataset.

arhamm1 · 2024-04-25T15:40:30Z

@ryantwolf / @Maghoumi can we review and get this merged?

arhamm1 · 2024-05-13T17:52:18Z

pinging again, @Maghoumi can you help review this - this will help teams trying to create multilingual datasets with Curator!

ayushdg · 2024-05-14T19:36:09Z

Thanks for opening the PR @nicoleeeluo All our PR's require commits to be verified (signed) and signed off.
To achieve this you need to commit with the -sS flags. (more info in the Contributing Guide.
More details on signing commits can be found on the guide: https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

nicoleeeluo · 2024-05-15T12:44:27Z

@ayushdg Thanks for reminding!

ryantwolf

Thanks so much for creating this tutorial! It's quite extensive which is super helpful. There are a couple of changes I'd appreciate if you could make, please let me know if you have any concerns or disagree with my requests. Thanks again.

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

ryantwolf · 2024-05-15T15:59:39Z

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

+    "def pre_imports():\n",
+    "    import cudf \n",
+    "\n",
+    "def load_dataset(input_data_dir, file_type='jsonl'):\n",


Do you think you could use the DocumentDataset.read_json and DocumentDataset.read_parquet methods we have added recently? Let me know if something about them would prevent you from using them.

ryantwolf · 2024-05-15T16:03:47Z

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

+   "source": [
+    "## 2.Language separation and unicode fixing\n",
+    "\n",
+    "**Note**: In order to be run on interactive python. Please comment `from.code import *` and the related imports in `./nemo_curator/filters/__init__.py`"


Can you try this again with the latest version of curator? And please let us know what errors you get if you get any.

This is fixed. I will amend the notebook accordingly

ryantwolf · 2024-05-15T16:06:19Z

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

+    "TH wikipedia data do have `id` field, but the `id` field contains number only. It will be better if we unified the `id` field and transform it to the format of `<prefix>_<id>`. In this way, when handling multiple dataset, we will able to know which document from which dataset has been removed. This `id` will be useful when we are running deduplication and heuristic filtering. The function we will be using is `AddID()`. Arguments for this function include:\n",
+    "- `id_field`: fields will be added to input .json file. If the key already exists in the .jsonl, it's value will be replaced.\n",
+    "- `id_prefix`: prefix used in ID. Default is 'doc-id'\n",
+    "- `start_index`: starting index in ID. Default is 0"


This recently changed. The default is now None and the id is considered "unordered" by default to improve speed.

Fixed the description. In the code section, I keep the start_index = 0 for easier reference

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

ryantwolf · 2024-05-15T16:22:11Z

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

+    "    2. Fuzzy deduplication\n",
+    "4. Heuristic filtering\n",
+    "\n",
+    "What is not included:\n",


Probably also worth mentioning that this also doesn't include

Distributed data classification with PyTorch models

Personal identifiable information (PII) redaction

nicoleeeluo · 2024-05-17T08:24:04Z

@ryantwolf Hi Ryan, I have pushed a new version to include the fixes you mentioned.

ryantwolf

Perfect! I have one last comment about the docker image you refer too, but other than that it looks great!

Though, as Ayush mentioned all our PR's require commits to be verified (signed) and signed off. Quoting from him earlier:

To achieve this you need to commit with the -sS flags. (more info in the Contributing Guide.
More details on signing commits can be found on the guide: https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

Also it looks like the style guide is failing for the config file you made. You should be able to fix it by running pip install pre-commit && pre-commit install && pre-commit run --all. Thanks again!

ryantwolf · 2024-05-17T23:01:22Z

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

+    "    Password: <Your NGC Key>\n",
+    "- Get NeMo NeMo Framework Training Container\n",
+    "    ```bash\n",
+    "    docker pull nvcr.io/ea-bignlp/ga-participants/nemofw-training:24.01\n"


I'd either change this to the dev container or the 24.05 tag (which hasn't been released yet).

docker pull nvcr.io/nvidia/nemo:dev.framework docker pull nvcr.io/nvidia/nemo:24.05.framework

Since the other container versions don't have the latest version of NeMo Curator that the tutorial uses.

Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Fix metadata inference with pandas and dask Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix datatypes for task decontamination Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Use targetted import Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Move tokenizer import Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Reduce inductor threads Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change env int to string Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change location of env var Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add comment linking issue Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Add fast id method Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add type conversion Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix off by one errors in tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Move GPU imports and make them optional Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move gpu dependencies to a seperate install Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove unused import Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Switch to placeholder import that raises on usage Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove deprecated utils usage Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add cuML attribution Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Safe import tests, improve install instruction, update gha workflow Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Fix pytests due to loc bug Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * update install instructions Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Raise on non module-not-found errors, update logging Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Update logging to not change root logger Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* [K8s]: Adds a helper script to create a dask cluster on k8s and includes instructions for how to a Curator workload on k8s Signed-off-by: Terry Kong <terryk@nvidia.com> * black formatting Signed-off-by: Terry Kong <terryk@nvidia.com> * big_english -> my_dataset Signed-off-by: Terry Kong <terryk@nvidia.com> * 24.01 -> 24.03 default container Signed-off-by: Terry Kong <terryk@nvidia.com> * Add help kwarg to all flags Signed-off-by: Terry Kong <terryk@nvidia.com> * Clarify why venv is needed Signed-off-by: Terry Kong <terryk@nvidia.com> * fix precommit failures Signed-off-by: Terry Kong <terryk@nvidia.com> --------- Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Refactor common utils and remove unused code Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * More cleanup Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * More updates/shuffling Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move gpu_dedup scripts into subfolder Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove gpu_deduplication subfolder Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add readme to fuzzy dedup scripts section Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Fix typo and relative links Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove legacy script entrypoints Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove legacy scripts and add init file Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Update GpuDeduplication.rst Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Fix lang id example Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add classifier unit tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add test for failure Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Remove failure test Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Add initial dataset blending function Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add blend unit tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add self parameter Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix return type of blend dataset Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix blending tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change assert statement for very uneven blend Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix key error Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add proper proportion blending test Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add four dataset blend and clarify docs Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add shuffle module Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add blend example and tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix random method name Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Wrap return type in DocumentDataset Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Save result of column drop Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change equality check for shuffle tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix expected order after shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add more documents to shuffle test Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add assert statement Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add within partition shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Refactor add rand column for shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix filename tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add determinism handling for shuffle Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Change numpy random function Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix tests with new random method Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Remove length call from blending Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Improve scaling of blending function Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix blend tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add blending script Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add additional file paths call Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add documentation Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Reformat docs Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Remove backticks Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add context manager for shuffle tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add better deterministic shuffle path Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Update documentation and reset index Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Initial pass at fuzzy dedup api Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Update deprecated shuffle arg Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * dask_cuda gpu only import Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move fuzzy_dedup imports to optional Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * more tests Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Move FuzzyDeDupConfig to it's own class Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add example script and config file, fix typo Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Remove slurm examples for gpu dedup Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add config module Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Rename FuzzyDeDupConfig and minhash_length to FuzzyDuplicatesConfig, num_hashes Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comments and update example Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Write to same format as input in fuzzy dedup example Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Fix pii index issue Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Add sequential wrapper Signed-off-by: Ryan Wolf <rywolf@nvidia.com> * Fix pii tests Signed-off-by: Ryan Wolf <rywolf@nvidia.com> --------- Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

…g speed (NVIDIA#57) This commit fixes issue NVIDIA#43 (empty files created when invoking reshard_jsonl method at nemo_curator.utils.file_utils.py) by double-checking the files size after being generated, and deleting them with size zero. In addition to that, I have noticed there is no need to parse to JSON object the content of the different lines, which should be already in json format. By removing that extra-parsing, there is a significant speed up in the execution of this method. Signed-off-by: Miguel Martínez <26169771+miguelusque@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

This PR adds a new tutorial to demonstrate data curation for PEFT use-cases. Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

* Move PII constants to a seperate file that does not import presidio/spacy and other GPU dependencies Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> * Add comment around import, move constant import to global scope Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> --------- Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Signed-off-by: Nicoel Luo <nluo@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

…zy deduplication wrapper example Signed-off-by: Nicole Luo <nluo@nvidia.com>

Signed-off-by: Nicole Luo <nluo@nvidia.com>

Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com>

Signed-off-by: Nicole Luo <nluo@nvidia.com>

nicoleeeluo · 2024-05-23T01:32:30Z

@ryantwolf Hi Ryan, I have fixed the commits accordingly. Would you help to review and see if there is any issue? Thank you!

ryantwolf · 2024-05-23T22:58:09Z

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

+    }
+   ],
+   "source": [
+    "client = get_client(args, args.device)\n",


Ah we just merged in a change that changes the function signature of this to no longer require an argparse object. It should be easier to use, but it does mean that you need to update it here. Ping me again when this is changed and I'll merge it in ASAP.

Thanks for noting. I have updated accordingly.

…map_bucket section Signed-off-by: Nicole Luo <nluo@nvidia.com>

ryantwolf

Thanks for the tutorial! This is super great

nicoleeeluo marked this pull request as ready for review April 11, 2024 13:06

ryantwolf reviewed May 15, 2024

View reviewed changes

ryantwolf reviewed May 17, 2024

View reviewed changes

nicoleeeluo and others added 22 commits May 20, 2024 07:54

Init commit for tutorial notebook

ec26f9f

Signed-off-by: Nicole Luo <nluo@nvidia.com>

Fix failing GPU tests with latest pandas bump (NVIDIA#41)

6d99292

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Disable string conversion globally (NVIDIA#56)

794a435

Signed-off-by: Ryan Wolf <rywolf@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

[Tutorials] Add a tutorial for PEFT data curation (NVIDIA#45)

d4a2f0f

This PR adds a new tutorial to demonstrate data curation for PEFT use-cases. Signed-off-by: Mehran Maghoumi <Maghoumi@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Deleting links

c66138a

Signed-off-by: Nicoel Luo <nluo@nvidia.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

148e1d4

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

7e08c96

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

75f5dd7

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

fcd8230

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

48af561

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

nicoleeeluo and others added 10 commits May 20, 2024 07:54

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

49efc21

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

5826eb1

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

30abf29

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

43eae27

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

87eefbd

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

262d8e0

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Update tutorials/single_node_tutorial/single_gpu_tutorial.ipynb

15db6f3

Co-authored-by: Ryan Wolf <ryantwolf1@gmail.com> Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com> Signed-off-by: Nicole Luo <nluo@nvidia.com>

Fixed typo. Update content to lastest NeMo Curator version. Added fuz…

84587b2

…zy deduplication wrapper example Signed-off-by: Nicole Luo <nluo@nvidia.com>

Fixing Style

4b024cb

Signed-off-by: Nicole Luo <nluo@nvidia.com>

Updating container version

0a50fd4

Signed-off-by: Nicole Luo <nluo@nvidia.com>

nicoleeeluo force-pushed the main branch from 6b3392e to 0a50fd4 Compare May 20, 2024 07:55

nicoleeeluo and others added 2 commits May 20, 2024 16:28

Merge branch 'main' into main

c119bf8

Signed-off-by: nicoleeeluo <157772168+nicoleeeluo@users.noreply.github.com>

Fixing style

2a9052c

Signed-off-by: Nicole Luo <nluo@nvidia.com>

ryantwolf reviewed May 23, 2024

View reviewed changes

nicoleeeluo and others added 2 commits May 24, 2024 10:32

Merge branch 'NVIDIA:main' into main

9ab7144

Update get_client() according to latest version; Update log path for …

11e4eba

…map_bucket section Signed-off-by: Nicole Luo <nluo@nvidia.com>

ryantwolf approved these changes May 24, 2024

View reviewed changes

ryantwolf merged commit 6fbc3ad into NVIDIA:main May 24, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add jupyter notebook tutorial for single node mulilingual dataset #30

Add jupyter notebook tutorial for single node mulilingual dataset #30

nicoleeeluo commented Apr 11, 2024

arhamm1 commented Apr 25, 2024

arhamm1 commented May 13, 2024

ayushdg commented May 14, 2024

nicoleeeluo commented May 15, 2024

ryantwolf left a comment

ryantwolf May 15, 2024

nicoleeeluo May 17, 2024

ryantwolf May 15, 2024

nicoleeeluo May 16, 2024

ryantwolf May 15, 2024

nicoleeeluo May 16, 2024

ryantwolf May 15, 2024

nicoleeeluo May 17, 2024

nicoleeeluo commented May 17, 2024

ryantwolf left a comment

ryantwolf May 17, 2024

nicoleeeluo May 21, 2024

nicoleeeluo commented May 23, 2024

ryantwolf May 23, 2024

nicoleeeluo May 24, 2024

ryantwolf left a comment

Add jupyter notebook tutorial for single node mulilingual dataset #30

Add jupyter notebook tutorial for single node mulilingual dataset #30

Conversation

nicoleeeluo commented Apr 11, 2024

arhamm1 commented Apr 25, 2024

arhamm1 commented May 13, 2024

ayushdg commented May 14, 2024

nicoleeeluo commented May 15, 2024

ryantwolf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicoleeeluo commented May 17, 2024

ryantwolf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicoleeeluo commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryantwolf left a comment

Choose a reason for hiding this comment