Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TORCH_CUDA_ARCH_LIST to specify which CUDA architectures to build for #3399

Merged
merged 16 commits into from
Mar 30, 2021

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Feb 15, 2021

This PR uses the TORCH_CUDA_ARCH_LIST env variable to specify which virtual and real architectures to build for, instead of using the more verbose NVCC_FLAGS specification.

Closes #2377
Closes #2026

@codecov
Copy link

codecov bot commented Feb 15, 2021

Codecov Report

Merging #3399 (6d383b4) into master (afc502b) will decrease coverage by 3.92%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3399      +/-   ##
==========================================
- Coverage   78.75%   74.83%   -3.93%     
==========================================
  Files         105      105              
  Lines        9750     9722      -28     
  Branches     1566     1563       -3     
==========================================
- Hits         7679     7275     -404     
- Misses       1581     1960     +379     
+ Partials      490      487       -3     
Impacted Files Coverage Δ
torchvision/datasets/hmdb51.py 28.33% <0.00%> (-66.67%) ⬇️
torchvision/datasets/fakedata.py 29.62% <0.00%> (-62.97%) ⬇️
torchvision/datasets/caltech.py 21.83% <0.00%> (-62.07%) ⬇️
torchvision/datasets/usps.py 32.35% <0.00%> (-61.77%) ⬇️
torchvision/datasets/phototour.py 25.25% <0.00%> (-56.93%) ⬇️
torchvision/datasets/kinetics.py 40.00% <0.00%> (-56.00%) ⬇️
torchvision/datasets/omniglot.py 31.91% <0.00%> (-48.94%) ⬇️
torchvision/datasets/semeion.py 36.36% <0.00%> (-47.73%) ⬇️
torchvision/datasets/sbu.py 23.72% <0.00%> (-47.46%) ⬇️
torchvision/datasets/sbd.py 34.42% <0.00%> (-47.25%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3428a7d...96c16af. Read the comment docs.

@NicolasHug
Copy link
Member Author

@ezyang it seems that removing the NVCC_FLAGS leads to an IndexError in pytorch, as the packaging machines don't have GPUs. Would you know what an appropriate fix could be?

@ezyang
Copy link
Contributor

ezyang commented Mar 1, 2021

Try setting TORCH_CUDA_ARCH_LIST in lieu of the manual NVCC flags. It's been a while since I wrote that comment, but I think my intent was that you still specify the arch list, just less low level.

@NicolasHug
Copy link
Member Author

@seemethere I'm trying to replace NVCC_FLAGS with TORCH_CUDA_ARCH_LIST in torchvision, but this seems to lead to an error in pytorch's setup.py: https://app.circleci.com/pipelines/github/pytorch/vision/6799/workflows/557c1018-186a-40ea-8725-5302ef620374/jobs/455001

I'm not sure whether this is an issue with pytorch or if I'm doing something wrong. Would you mind taking a look? Thanks!

@ezyang
Copy link
Contributor

ezyang commented Mar 29, 2021

Here's the code that's failing:

    # The default is sm_30 for CUDA 9.x and 10.x
    # First check for an env var (same as used by the main setup.py)
    # Can be one or more architectures, e.g. "6.1" or "3.5;5.2;6.0;6.1;7.0+PTX"
    # See cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake
    _arch_list = os.environ.get('TORCH_CUDA_ARCH_LIST', None)

    # If not given, determine what's best for the GPU / CUDA version that can be found
    if not _arch_list:
        arch_list = []
        # the assumption is that the extension should run on any of the currently visible cards,
        # which could be of different types - therefore all archs for visible cards should be included
        for i in range(torch.cuda.device_count()):
            capability = torch.cuda.get_device_capability(i)
            supported_sm = [int(arch.split('_')[1])
                            for arch in torch.cuda.get_arch_list() if 'sm_' in arch]
            max_supported_sm = max((sm // 10, sm % 10) for sm in supported_sm)
            # Capability of the device may be higher than what's supported by the user's
            # NVCC, causing compilation error. User's NVCC is expected to match the one
            # used to build pytorch, so we use the maximum supported capability of pytorch
            # to clamp the capability.
            capability = min(max_supported_sm, capability)
            arch = f'{capability[0]}.{capability[1]}'
            if arch not in arch_list:
                arch_list.append(arch)
        arch_list = sorted(arch_list)
        arch_list[-1] += '+PTX'

This suggests that the environment variable isn't being passed through. Add this env var to the conda meta.yaml here

- BUILD_VERSION

@NicolasHug NicolasHug changed the title WIP remove NVCC_FLAGS from pkg_helpers Remove NVCC_FLAGS from pkg_helpers Mar 29, 2021
@NicolasHug NicolasHug changed the title Remove NVCC_FLAGS from pkg_helpers Remove NVCC_FLAGS from pkg_helpers and use TORCH_CUDA_ARCH_LIST Mar 29, 2021
@NicolasHug NicolasHug changed the title Remove NVCC_FLAGS from pkg_helpers and use TORCH_CUDA_ARCH_LIST Use TORCH_CUDA_ARCH_LIST to specify which CUDA architectures to build for Mar 29, 2021
@@ -39,6 +39,7 @@ build:
- FORCE_CUDA
- NVCC_FLAGS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this flag now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can remove it, as NVCC_FLAGS is still being used in the windows files:

IF "%BUILD_VISION%" == "" (
set TORCH_CUDA_ARCH_LIST=3.5;5.0+PTX;6.0;6.1;7.0;7.5
set TORCH_NVCC_FLAGS=-Xfatbin -compress-all
) ELSE (
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_50,code=compute_50
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we still need this given that PyTorch already set those in cpp_extensions.

Maybe @peterjc123 could chime in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it in 5055fa8 and it looks like the CI is still happy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but that codepath might not be exercised on the PR CI, but only on the release CIs.

I'd love to get @peterjc123 to double-check this part before we merge this

Copy link
Contributor

@peterjc123 peterjc123 Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it seems the script (vision/packaging/windows/cuda101.bat) is not used anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the confirmation @peterjc123 !

I've opened #3617 to track removing this file

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NicolasHug

@fmassa fmassa merged commit 226126b into pytorch:master Mar 30, 2021
facebook-github-bot pushed a commit that referenced this pull request Apr 2, 2021
… to build for (#3399)

Summary:
* trying stuff

* Put flags back?

* remove second one just to see

* It worked but it's because it's not built by CI. Trying another one

* Try using TORCH_CUDA_ARCH_LIST instead

* oops

* Add new env variable to build/script_env

* set TORCH_CUDA_ARCH_LIST for the rest of the CUDA versions

* don't pass NVCC_FLAGS venv to conda, let's see if it works

Reviewed By: fmassa

Differential Revision: D27433927

fbshipit-source-id: e207f7fe6a1bab322e41d6dcabe7eb2b8e70b246

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
@xkszltl
Copy link
Contributor

xkszltl commented Jul 6, 2021

I wonder if we should have TORCH_CUDA_ARCH_LIST exported from pytorch to cmake configs, so that torch vision can build consistently with the pytorch it's relying on?

@fmassa
Copy link
Member

fmassa commented Aug 13, 2021

@xksteven yes, I think this is a good idea and something I would love to see implemented in PyTorch. Can you create an issue in PyTorch to track this down?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NVCC_FLAGS not needed in pkg_helpers.bash anymore? Clean up nvcc arch TODO
6 participants