Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for new version #83

Merged
merged 3 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 26 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,21 +39,42 @@ These modules are designed to be flexible and allow for reordering with few exce

NeMo Curator currently requires Python 3.10 and the GPU accelerated modules require CUDA 12 or above installed in order to be used.

NeMo Curator can be installed manually by cloning the repository and installing as follows -
### PyPi

NeMo Curator can be installed via PyPi as follows -

For CPU only modules:
```
pip install .
pip install nemo-curator
```

For CPU + CUDA accelerated modules
For CPU + CUDA accelerated modules:
```
pip install --extra-index-url https://pypi.nvidia.com ".[cuda12x]"
pip install --extra-index-url https://pypi.nvidia.com nemo-curator[cuda12x]
```

### NeMo Framework Container

NeMo Curator is available in the [NeMo Framework Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo). The NeMo Framework Container provides an end-to-end platform for development of custom generative AI models anywhere. The latest release of NeMo Curator comes preinstalled in the container.
The latest release of NeMo Curator is preinstalled in the [NeMo Framework Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo). The NeMo Framework Container provides an end-to-end platform for development of custom generative AI models anywhere. If you want the latest commit inside the container, uninstall the existing version using

```
pip uninstall nemo-curator
```
And follow the instructions for installing from source below.

### Source

If you want to install the latest commit, please clone the repository and install with either of the following commands

For CPU only modules:
```
pip install .
```

For CPU + CUDA accelerated modules:
```
pip install --extra-index-url https://pypi.nvidia.com ".[cuda12x]"
```

## Usage

Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

setup(
name="nemo_curator",
version="0.2.0",
version="0.3.0",
description="Scalable Data Preprocessing Tool for "
"Training Large Language Models",
long_description=long_description,
Expand Down Expand Up @@ -54,7 +54,7 @@
"jieba==0.42.1",
"comment_parser",
"beautifulsoup4",
"mwparserfromhell @ git+https://github.com/earwig/mwparserfromhell.git@0f89f44",
"mwparserfromhell==0.6.5",
"spacy>=3.6.0, <4.0.0",
"presidio-analyzer==2.2.351",
"presidio-anonymizer==2.2.351",
Expand Down