Skip to content

Build PaddlePaddle from source with latest local changes

Tao Luo edited this page Dec 9, 2019 · 1 revision

*** This is a draft! ***

Why we want to build PaddlePaddle from source?

Normally if a user wants to use PaddlePaddle, he could just pull the latest Docker image from PaddlePaddle's Dockerhub

latest has the latest prod version and is normally smaller whereas latest-dev is the latest develop version with all the necessary tools installed (e.g., Git, Vim, various C++/Python libraries, etc.) therefore is usually larger. As of May 8, 2018, latest-dev is 2GB and latest is 529 MB.

Note even the latest-dev Docker image might miss some even-more-recent code or local changes. So if a user (probably a developer) wants to build PaddlePaddle with the latest code, or wants to involve his local changes, then he could build PaddlePaddle from source. Furthermore, this can also allow the user to customize the build by tuning different parameters. For example, he could disable distributed functionality (WITH_DISTRIBUTE=OFF) if he only runs PaddlePaddle locally.

Steps

Building PaddlePaddle from source does not require much. All you need is:

  1. A computer. It can be Linux, Windows or macOS
  2. Docker

We do not need any other software. Even Python or C++ are not needed. Now let's walk through the steps of building PaddlePaddle from source.

Git clone PaddlePaddle

Since we are building PaddlePaddle from source, we need to first clone the git repo to get the source.

git clone https://github.com/PaddlePaddle/Paddle.git

Build dev image

The safest way to guarantee PaddlePaddle can work is to first build the latest dev image from the user's computer. The dev image has all the dev tools installs, such as Vim, Git and util libraries.

docker build -t mypaddle .

This dev image is very large (5.2GB as of May 8, 2018)

Build prod image from the dev environment

Now we have a fresh dev environment from the dev image we just built in the previous step, we can use it to build a fresh prod PaddlePaddle.

Start a dev container with the parameters of your choice and bash into it:

docker run -it -v `pwd`:/paddle -v /root/.cache:/root/.cache -e WITH_GPU=OFF -e WITH_AVX=ON -e WITH_GOLANG=OFF -e WITH_TESTING=OFF -e WITH_COVERAGE=OFF -e COVERALLS_UPLOAD=OFF -e WITH_C_API=OFF -e CMAKE_BUILD_TYPE=RelWithDebInfo -e WITH_MKL=OFF -e WITH_DEB=OFF -e PADDLE_VERSION=0.10.0 -e PADDLE_FRACTION_GPU_MEMORY_TO_USE=0.15 -e RUN_TEST=OFF -e CUDA_ARCH_NAME=Auto -e WITH_FLUID_ONLY=ON -e WITH_DISTRIBUTE=OFF mypaddle:latest /bin/bash

Inside the dev environment, we can build a Dockerfile for the light-weight prod version

λ 38f6e151afea /paddle {develop} ./paddle/scripts/paddle_build.sh dockerfile
    ========================================
    Generate /paddle/build/Dockerfile ...
    ========================================

Then we exit from the dev environment and go to paddle/build to build the light-weight prod image which runs faster with a smaller size.

λ 38f6e151afea /paddle {develop} exit
/Paddle$ cd build/
/Paddle/build$ docker build -t mypaddleprod .

This prod version has a much smaller size (1.6GB as of May 8, 2018)

Then we can start and log in to the prod container and all the changes, both Python and C++, are taking effect in that container

docker run -it -v `pwd`:/paddle mypaddleprod /bin/bash
Clone this wiki locally