Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install locales and generate en_US.UTF-8 #5627

Merged
merged 2 commits into from
May 27, 2022

Conversation

GaryShen2008
Copy link
Collaborator

@GaryShen2008 GaryShen2008 commented May 25, 2022

Related to #5549, but not close it.
Install locales and generate en_US.UTF-8 in ubuntu docker image.
No need to update centos one because there's already en_US.UTF-8.

Signed-off-by: gashen gashen@nvidia.com

Signed-off-by: gashen <gashen@nvidia.com>
@GaryShen2008
Copy link
Collaborator Author

We may also need to update the integration test docker image.

@GaryShen2008
Copy link
Collaborator Author

build

andygrove
andygrove previously approved these changes May 25, 2022
# install locale and use UTF-8
RUN apt-get install -y locales
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the unit test in the Docker container, but what about someone who runs the test outside of this container (i.e.: everybody other than CI) who doesn't happen to have LANG=en_US.UTF-8 setup in their environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum, we should document this requirement for correct behavior for regexp handling of Unicode text. We could also consider falling back to CPU if LANG != en_US.UTF-8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to make sure this issue isn't considered "fixed" because we finally got the tests to pass. Filed #5629 to track it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated the PR as my comment. It doesn't fix the issue, but prepare the condition to fix it in test scripts.

@jlowe jlowe added the build Related to CI / CD or cleanly building label May 25, 2022
@pxLi
Copy link
Collaborator

pxLi commented May 26, 2022

I would prefer to put this in the https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/integration_tests/run_pyspark_from_build.sh (or the pytest file) directly instead of the dockerfile as its actually part of the test

locales installation should be fine to be put inside the dockerfile. And would be nice to doc this requirement under https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/integration_tests/README.md#prerequisities

Don't set LANG to UTF-8, leave it to the test script

Signed-off-by: gashen <gashen@nvidia.com>
@GaryShen2008
Copy link
Collaborator Author

build

@GaryShen2008
Copy link
Collaborator Author

Updated the PR to only install locales and generate the UTF-8, leave the LANG be set in test script.
I only updated ubuntu docker images, because centos7 docker has the en_US.UTF-8 locale, we can just set LANG=en_US.UTF-8 to switch locale.

@GaryShen2008 GaryShen2008 changed the title Set language to en_US.UTF-8 Install locales and generate en_US.UTF-8 May 26, 2022
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @GaryShen2008

@GaryShen2008 GaryShen2008 merged commit 6deabca into NVIDIA:branch-22.06 May 27, 2022
@andygrove
Copy link
Contributor

andygrove commented May 27, 2022

I would prefer to put this in the https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/integration_tests/run_pyspark_from_build.sh (or the pytest file) directly instead of the dockerfile as its actually part of the test

locales installation should be fine to be put inside the dockerfile. And would be nice to doc this requirement under https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/integration_tests/README.md#prerequisities

@pxLi We also need to set LANG when running the Scala unit tests and I'm not sure which script we would need to update to do that.

IMHO, if only one case/test requires that change, it would be better to be put in the scala test code directly.
Otherwise all other tests will have a non-default ENV w/ specific LANG, this is not expected.
or
if we want to make our default runtime using UTF8, then we would better to document this somewhere to make sure our developers and users know that our plugin requires UTF8 as default LANG in their ENV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants