Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add shared mkl lib in whl #3461

Merged
merged 10 commits into from
Aug 15, 2017
Merged

add shared mkl lib in whl #3461

merged 10 commits into from
Aug 15, 2017

Conversation

tensor-tang
Copy link
Contributor

update new solution of #3401

This one should BOTH fix #3332 and fix #3213.
Please have a try on your local machine, any question feel feel to contact me .

And BTW, I found that below did not install the bin files into expected path /usr/local/opt/paddle/bin correctly. Then should cause the issue #3421.

data_files=[('/usr/local/opt/paddle/bin',
                       ['${PADDLE_BINARY_DIR}/paddle/scripts/paddle_usage',
                        '${PADDLE_BINARY_DIR}/paddle/trainer/paddle_trainer',
                        '${PADDLE_BINARY_DIR}/paddle/trainer/paddle_merge_model',
                        '${PADDLE_BINARY_DIR}/paddle/pserver/paddle_pserver_main'])]

So I also fix it.

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

image
please merge the latest code

@tensor-tang
Copy link
Contributor Author

Thanks~

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

Please have a try on your local machine

我应该在本地检查哪几个方面呢?

@tensor-tang
Copy link
Contributor Author

就是之前那两个issue,复现一下。应该是不能复现issue就对了。

主要是编译完之后, pip install whl,删掉本地的build目录(或者重命名)。
或者保险起见,换一个干净的机器install whl。

然后尝试import py_paddle, paddle.v2.framework.core

必须都不能出现

libmklml_intel.so: cannot open shared object file: No such file or directory

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

我在本地测了,出现:

>>> import paddle.v2.framework.core
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: ../../third_party/install/mklml/mklml_lnx_2018.0.20170720/lib/libmklml_intel.so: cannot open shared object file: No such file or directory

import paddle和import py_paddle都是可以的。
另外,可以加单测专门测一下这个么?

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Aug 14, 2017

不是这个code吧,感觉没生效呢。

要先重新编译,然后安装了吗?

或者pip uninstall paddlepaddle, 把输入list贴下?

mkl_shared_libs='${MKL_SHARED_LIBS}'
if mkl_shared_libs != '':
paddle_rt_libs += mkl_shared_libs.split(';')
print paddle_rt_libs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34-39行可以合并:不需要print

paddle_rt_libs = [] if '${MKL_SHARED_LIBS}'== '' else '${MKL_SHARED_LIBS}'.split(';')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,thx。

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

是重新编译并安装的:

Uninstalling paddlepaddle-0.10.0:
  /home/luotao/.jumbo/bin/paddle
  /home/luotao/.jumbo/lib/python2.7/site-packages/paddle/__init__.py
  /home/luotao/.jumbo/lib/python2.7/site-packages/paddle/__init__.pyc
...
 /home/luotao/.jumbo/lib/python2.7/site-packages/py_paddle/util.pyc
  /home/luotao/.jumbo/local/lib/libiomp5.so
  /home/luotao/.jumbo/local/lib/libmkldnn.so
  /home/luotao/.jumbo/local/lib/libmkldnn.so.0
  /home/luotao/.jumbo/local/lib/libmklml_intel.so
  /home/luotao/.jumbo/local/opt/paddle/bin/paddle_merge_model
  /home/luotao/.jumbo/local/opt/paddle/bin/paddle_pserver_main
  /home/luotao/.jumbo/local/opt/paddle/bin/paddle_trainer
  /home/luotao/.jumbo/local/opt/paddle/bin/paddle_usage

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Aug 14, 2017

嗯,看到了,不过为什么你安装的都在.jumbo目录下?

pip install的时候设置了prefix了吗?

在这个前提下,你以前执行paddle,也会出现 #3421 的问题吗?

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

pip install的时候没有设置prefix。内网机器没有root权限,用jumbo来安装各种软件和python包,所以都在jumbo下面了。

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

#3421 是用docker来装的,用docker装的话,必须在有root权限的机器上。

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Aug 14, 2017

echo $LD_LIBRARY_PATH
echo $PATH
是什么呢?

我在猜想,你本机执行paddle 是不是也会遇到问题?因为他是安装在

/home/luotao/.jumbo/bin/paddle

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

[17:34:06]	[Step 1/1] gzip: stdin: invalid compressed data--format violated
[17:34:06]	[Step 1/1] tar: Unexpected EOF in archive
[17:34:06]	[Step 1/1] tar: Unexpected EOF in archive
[17:34:06]	[Step 1/1] tar: Error is not recoverable: exiting now
[17:34:06]	[Step 1/1] make[2]: *** [third_party/mklml/src/extern_mklml-stamp/extern_mklml-download] Error 2
[17:34:06]	[Step 1/1] make[1]: *** [CMakeFiles/extern_mklml.dir/all] Error 2
[17:34:06]	[Step 1/1] CMakeFiles/extern_mklml.dir/build.make:88: recipe for target 'third_party/mklml/src/extern_mklml-stamp/extern_mklml-download' failed
[17:34:06]	[Step 1/1] CMakeFiles/Makefile2:359: recipe for target 'CMakeFiles/extern_mklml.dir/all' failed
[17:34:06]	[Step 1/1] Makefile:160: recipe for target 'all' failed
[17:34:06]	[Step 1/1] make: *** [all] Error 2
[17:34:06]	[Step 1/1] Process exited with code 2
[17:34:06]	[Step 1/1] Process exited with code 2
[17:34:06]	[Step 1/1] Step Build and test (Command Line) failed

teamcity里的错误,我也有,我删掉后就下载出错,后来是手动下载并touch /home/luotao/Paddle/build/third_party/mklml/src/extern_mklml-stamp/extern_mklml-download。原来以为是网速问题,但teamcity出现了,应该不是网速问题。

LD_LIBRARY_PATH和PATH见echo.log.txt

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

你本机执行paddle 是不是也会遇到问题?

我把路径/home/luotao/.jumbo/bin写进.bashrc了,所以不会出现问题。这个bin下面也装了很多其他软件。

@@ -23,6 +23,21 @@ with open('@PADDLE_SOURCE_DIR@/python/requirements.txt') as f:
if '${CMAKE_SYSTEM_PROCESSOR}' not in ['arm', 'armv7-a', 'aarch64']:
setup_requires+=["opencv-python"]

# the prefix is sys.prefix which should always be usr
paddle_bin_dir = 'local/opt/paddle/bin'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This is the best way to fix installation dir issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in docker which will install under /usr/local/local/opt/paddle/bin/paddle_usage if python's main path is under /usr/local, should this be opt/paddle/bin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about not in docker? Still like local/local ?
Or could you set --prefix=/usr when setup docker.

Because I thought outside docker, it should be default as /usr. I tried on my machine and just found that in @luotao1 's machine the path is still look right to me:

/home/luotao/.jumbo/local/opt/paddle/bin/paddle_merge_model
/home/luotao/.jumbo/local/opt/paddle/bin/paddle_pserver_main
/home/luotao/.jumbo/local/opt/paddle/bin/paddle_trainer
/home/luotao/.jumbo/local/opt/paddle/bin/paddle_usage

Her case is under /home/luotao/.jumbo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought paddle command is installed by scripts=['${PADDLE_BINARY_DIR}/paddle/scripts/paddle'], so it's installed to where python installed at. When running pip install without setting --prefix paddle command will try to find binaries relatively, whether in docker or not.

In @luotao1 's case, is it working if binaries are installed under /home/luotao/.jumbo/opt/paddle/bin/? If so, can using paddle_bin_dir = 'opt/paddle/bin'` solve all the cases?

Copy link
Contributor

@luotao1 luotao1 Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@typhoonzero

  • make install后,是在/home/luotao/Paddle/build/opt/paddle/bin
  • whl路径:home/luotao/Paddle/build/opt/paddle/share/wheels/paddlepaddle-0.10.0-cp27-cp27mu-linux_x86_64.whl
  • pip install后,是在/home/luotao/.jumbo/local/opt/paddle/bin下,所有python相关都在/home/luotao/.jumbo/lib/python2.7/site-packages/paddle/home/luotao/.jumbo/lib/python2.7/site-packages/py_paddle下。

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Aug 14, 2017

@luotao1

我把路径/home/luotao/.jumbo/bin写进.bashrc了,所以不会出现问题。

Thx, 如果是这样的话,那么也需要加一下LD_LIBRARY_PATH, 把/home/luotao/.jumbo/local/lib加进去应该就不会有问题了。

teamcity里的错误,我也有,我删掉后就下载出错,后来是手动下载并touch /home/luotao/Paddle/build/third_party/mklml/src/extern_mklml-stamp/extern_mklml-download。原来以为是网速问题,但teamcity出现了,应该不是网速问题。

重新跑了一次就过了,感觉还是网络问题呢,或者是服务器的问题?

另外,可以加单测专门测一下这个么?

这个原来的单测里面,不包括测试使用吗?我感觉是不是有的?
因为一般照常使用的话,这个import都是必要的。

@luotao1
Copy link
Contributor

luotao1 commented Aug 14, 2017

把/home/luotao/.jumbo/local/lib加入LD_LIBRARY_PATH,还是一样的问题

@tensor-tang
Copy link
Contributor Author

@luotao1 我能复现你的问题了。
不过我发现,在最后一个commit merge最新code之前,我这里是能过的,merge之后就不能过了,就连安装到usr也不能过了。
估计merge的code中哪里有点影响。

Working on it.

@tensor-tang
Copy link
Contributor Author

Done, @luotao1 you can have a try now.
The issue should be caused by static lib of cblas.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@QiJune
Copy link
Member

QiJune commented Aug 15, 2017

@tensor-tang I have a test in docker, it looks good!
But, I find that when paddle wheel was installed, a directory /usr/local/local will be created.

pushd /usr/local/local
ls /lib
libiomp5.so  libmkldnn.so  libmkldnn.so.0  libmklml_intel.so
ls opt/paddle/bin/
paddle_merge_model  paddle_pserver_main  paddle_trainer  paddle_usage

Is the directory path /usr/local/local right?

@tensor-tang
Copy link
Contributor Author

Thanks @QiJune
This is the concern of @typhoonzero

This seems because docker will auto add /usr/local as prefix instead of /usr.

So I recommend could we add --prefix='/usr' when docker?

Then pip install *whl and cmake install use the same path of /usr/local/opt/paddle/bin.

@luotao1
Copy link
Contributor

luotao1 commented Aug 15, 2017

@QiJune /usr/local/local in docker will be fixed in next PR by @typhoonzero .

@luotao1 luotao1 merged commit 33d502e into PaddlePaddle:develop Aug 15, 2017
@tensor-tang
Copy link
Contributor Author

@luotao1 will add unit test for this import * case.
And will also add WarpCTC so into paddle_rt_lib_dir trying to fix the warpctc dynamic issue.

Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
4 participants