Skip to content

Commit

Permalink
Add dependency to fix justext (#24)
Browse files Browse the repository at this point in the history
* Add dependency to fix justext

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add test for imports

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add self arg

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

* Add note about upstream issue

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>

---------

Signed-off-by: Ryan Wolf <rywolf@nvidia.com>
  • Loading branch information
ryantwolf authored Apr 10, 2024
1 parent 9f42c49 commit c78ad21
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 0 deletions.
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@
"presidio-anonymizer==2.2.351",
"usaddress==0.5.10",
"nemo_toolkit[nlp]>=1.23.0",
# justext installation breaks without lxml[html_clean]
# due to this: https://github.com/miso-belica/jusText/issues/47
"lxml[html_clean]",
],
entry_points={
"console_scripts": [
Expand Down
9 changes: 9 additions & 0 deletions tests/test_download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
class TestDownload:
def test_imports(self):
from nemo_curator.download import (
download_arxiv,
download_common_crawl,
download_wikipedia,
)

assert True

0 comments on commit c78ad21

Please sign in to comment.