Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix shuffle manager doc on ucx library path #1858

Merged
merged 2 commits into from
Mar 4, 2021

Conversation

rongou
Copy link
Collaborator

@rongou rongou commented Mar 3, 2021

Signed-off-by: Rong Ou rong.ou@gmail.com

The UCX 1.9.0 debian package puts libjucx.so in /usr/lib, which conflicts with the one in our jar file.

Signed-off-by: Rong Ou <rong.ou@gmail.com>
@rongou rongou requested a review from abellina March 3, 2021 18:15
@rongou rongou added the documentation Improvements or additions to documentation label Mar 3, 2021
@abellina
Copy link
Collaborator

abellina commented Mar 3, 2021

I can't reproduce this myself, with a system that has libjucx.so in /usr/lib with LD_LIBRARY_PATH set to /usr/lib:/usr/lib/ucx (which the part you removed should be there by default). I see us opening the libjucx.so that is bundled via the jucx jar. I am not sure why it is picking up the jucx lib from /usr/lib in your case, instead of the one we bundle in our dist jar.

If I force it to load, I do get a segfault (using LD_PRELOAD=/usr/lib/libjucx.so). It seems the libjucx.so for UCX 1.9 in the packaging is different than the one we bundle in our jar (seeing some symbols in the read-only data section that are missing from the /usr/lib one, but there may be other differences). Can you verify using LD_DEBUG=libs that the library in /usr/lib is the one that is getting loaded for you?

libjucx.so shouldn't be included in UCX packaging. I chatted with @petro-rudenko, and he opened a PR to remove libjucx.so from the packaging (openucx/ucx#6440). UCX 1.10 PR is pending. So starting ucx 1.10, libjucx.so is not going to be in /usr/lib.

In the doc we do say: set LD_LIBRARY_PATH if the UCX library is in a non-default location. In terms of this PR I think we should remove the LD_LIBRARY_PATH config provided, as that is just confusing. We could leave the bit after it where we make the user aware this needs native libraries and LD_LIBRARY_PATH may need to be set in some circumstances.

Signed-off-by: Rong Ou <rong.ou@gmail.com>
@rongou
Copy link
Collaborator Author

rongou commented Mar 4, 2021

Removed the line.

@abellina
Copy link
Collaborator

abellina commented Mar 4, 2021

build

@abellina
Copy link
Collaborator

abellina commented Mar 4, 2021

Note that @petro-rudenko's PR went into 1.10: openucx/ucx#6441.

@abellina abellina merged commit ad0b6d9 into NVIDIA:branch-0.5 Mar 4, 2021
@rongou rongou deleted the fix-ucx-doc branch May 11, 2021 17:50
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* fix shuffle manager doc on ucx library path

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* remove ld library path line

Signed-off-by: Rong Ou <rong.ou@gmail.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* fix shuffle manager doc on ucx library path

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* remove ld library path line

Signed-off-by: Rong Ou <rong.ou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants