Skip to content

Commit

Permalink
Update documentation for new version (#83)
Browse files Browse the repository at this point in the history
* Update documentation for new version

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Fix bad merge

Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>

---------

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
Signed-off-by: Ayush Dattagupta <ayushdg95@gmail.com>
Co-authored-by: Ayush Dattagupta <ayushdg95@gmail.com>
  • Loading branch information
ryantwolf and ayushdg authored Jun 3, 2024
1 parent e814736 commit 615dce6
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 7 deletions.
33 changes: 28 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,29 @@ Before installing NeMo Curator, ensure that the following requirements are met:

## Install NeMo Curator

Two options are available for installing NeMo Curator. You can install it from the repository or through the NeMo Framework container.
You can install NeMo-Curator from PyPi, from source or get it through the NeMo Framework container.

### Install from the Repository
### PyPi

NeMo Curator can be installed via PyPi as follows -

To install the CPU-only modules:

```bash
pip install nemo-curator
```

To install the CPU and CUDA-accelerated modules:

```bash
pip install --extra-index-url https://pypi.nvidia.com nemo-curator[cuda12x]
```

### From Source

1. Clone the NeMo Curator repository in GitHub.

```
```bash
git clone https://github.com/NVIDIA/NeMo-Curator.git
cd NeMo-Curator
```
Expand All @@ -82,20 +98,27 @@ Two options are available for installing NeMo Curator. You can install it from

To install the CPU-only modules:

```
```bash
pip install .
```

To install the CPU and CUDA-accelerated modules:

```
```bash
pip install --extra-index-url https://pypi.nvidia.com ".[cuda12x]"
```

### Install from the NeMo Framework Container

NeMo Curator is available in the [NeMo Framework Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). The latest release of NeMo Curator comes preinstalled in the container.

If you want the latest commit inside the container, uninstall the existing version using:

```bash
pip uninstall nemo-curator
```
And follow the instructions for installing from source from [above](#from-source).

## Use the Python Library

The following snippet demonstrates how to create a small data curation pipeline that downloads and curates a small subset of the Common Crawl dataset.
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

setup(
name="nemo_curator",
version="0.2.0",
version="0.3.0",
description="Scalable Data Preprocessing Tool for "
"Training Large Language Models",
long_description=long_description,
Expand Down Expand Up @@ -54,7 +54,7 @@
"jieba==0.42.1",
"comment_parser",
"beautifulsoup4",
"mwparserfromhell @ git+https://github.com/earwig/mwparserfromhell.git@0f89f44",
"mwparserfromhell==0.6.5",
"spacy>=3.6.0, <4.0.0",
"presidio-analyzer==2.2.351",
"presidio-anonymizer==2.2.351",
Expand Down

0 comments on commit 615dce6

Please sign in to comment.