add cmd for running notebook

Signed-off-by: Yang Yu <yayu@yayu-mlt.client.nvidia.com>
NVIDIA · Oct 2, 2024 · 1261e75 · 1261e75
1 parent a2725ed
commit 1261e75
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/tutorials/pretraining-data-curation/README.md b/tutorials/pretraining-data-curation/README.md
@@ -6,4 +6,6 @@ This tutorial demonstrates the usage of NeMo Curator to curate the RedPajama-Dat
 RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots and processed using the CCNet pipeline. In this tutorial, we will be perform data curation on two snapshots for demonstration purpuses.
 
 ## Getting Started
-This tutorial is designed for multi-node environment and uses slurm for scheduling allocating resources. To start the tutorial, run the `start-distributed-notebook.sh` script in this directory which will start the Jupyter notebook that demonstrates the step by step walkthrough of the end to end curation pipeline. The notebook will run on port 8000 of the scheduler node. To work with the notenook locally, you can set up a SSH connection to the scheduler node.
+This tutorial is designed for multi-node environment and uses slurm for scheduling allocating resources. To start the tutorial, run the `start-distributed-notebook.sh` script in this directory which will start the Jupyter notebook that demonstrates the step by step walkthrough of the end to end curation pipeline. The notebook will run on port 8000 of the scheduler node. To work with the notenook locally, you can set up a SSH connection to the scheduler node:
+
+`ssh -L <local_port>:localhost:8888 <user>@<scheduler_address>`