Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in cv::cuda::warpPerspective #2361

Open
andrejlevkovitch opened this issue Nov 25, 2019 · 13 comments
Open

Bug in cv::cuda::warpPerspective #2361

andrejlevkovitch opened this issue Nov 25, 2019 · 13 comments

Comments

@andrejlevkovitch
Copy link

andrejlevkovitch commented Nov 25, 2019

System information (version)
  • OpenCV => 4.1.0
  • Operating System / Platform => debian10
  • Compiler => g++-8.3.0
  • Cuda => cuda-10.1
Detailed description

With some matricies (perspective transformation) I get exception about illegal memory access in GPU memory:

OpenCV(4.1.0) /tmp/opencv_dir/modules/core/src/cuda_stream.cpp:473: error: (-217:Gpu API call) an illegal memory access was encountered in function 'waitForCompletion'

This happens only if I use not default stream and linear interpolation.

Steps to reproduce

Tested with cuda-10.1 and opencv-4.1.0.

// main.cpp

#include <cstdlib>
#include <opencv2/cudawarping.hpp>
#include <vector>

int main() {
  cv::Size inputSize{1000, 1000}; // can be different

  cv::Size outSize{670, 893};
  std::vector<double> values{
      0.3154414342519155,    0.2158186507123098,    133.1013012869572,
      -0.01298153793321658,  0.5681477441984867,    297.7284881948252,
      1.839213382549264e-05, 0.0006366310508410327, 1};

  cv::Mat matrix{cv::Size{3, 3}, CV_64FC1, values.data()};

  cv::cuda::GpuMat image{inputSize, CV_8UC4};

  cv::cuda::Stream stream;
  cv::cuda::warpPerspective(image, image, matrix, outSize, cv::INTER_LINEAR,
                            cv::BORDER_CONSTANT, cv::Scalar{0, 0, 0, 0},
                            stream);
  stream.waitForCompletion();

  return EXIT_SUCCESS;
}
# cmake

cmake_minimum_required(VERSION 3.12)

project(min_ex)

find_package(OpenCV COMPONENTS core cudawarping REQUIRED)

add_executable(${PROJECT_NAME} main.cpp)
target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_11)
target_link_libraries(${PROJECT_NAME} PRIVATE ${OpenCV_LIBS})
target_include_directories(${PROJECT_NAME} PRIVATE ${OpenCV_INCLUDE_DIRS})
@nglee
Copy link
Contributor

nglee commented Dec 29, 2019

Same on 3.4 branch.

@nglee
Copy link
Contributor

nglee commented Aug 28, 2020

Testing with CUDA8.0 and CUDA10.1, both cases seem to have this bug. GTX 1080 Ti

@andrejlevkovitch
Copy link
Author

andrejlevkovitch commented Jan 14, 2021

I simplify example to:

// main.cpp

#include <cstdlib>
#include <iostream>
#include <opencv2/cudawarping.hpp>
#include <vector>

int main() {
  cv::Size inputSize{1, 1};
  cv::Size outSize{1, 1};

  // clang-format off
  std::vector<float> values{
      0, 0, 0,
      0, 0, 2147483648, // XXX after converting to int we get -2147483648
  };
  // clang-format on

  cv::Mat matrix{cv::Size{3, 2}, CV_32FC1, values.data()};

  cv::cuda::GpuMat image{inputSize, CV_8UC1};
  cv::cuda::GpuMat output;

  cv::cuda::Stream stream;
  cv::cuda::warpAffine(image, output, matrix, outSize,
                       cv::INTER_LINEAR | cv::WARP_INVERSE_MAP,
                       cv::BORDER_CONSTANT, cv::Scalar{0}, stream);
  stream.waitForCompletion();

  return EXIT_SUCCESS;
}

Also I recompile opencv with nvcc flag -lineinfo and run this example by cuda-memcheck. Here is a output:

OpenCV(4.1.0) Error: Gpu API call (unspecified launch failure) in waitForCompletion, file /home/user/tmp/opencv_dir/modules/core/src/cuda_stream.cpp, line 473
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.1.0) /home/user/tmp/opencv_dir/modules/core/src/cuda_stream.cpp:473: error: (-217:Gpu API call) unspecified launch failure in function 'waitForCompletion'

========= CUDA-MEMCHECK
========= Invalid __global__ read of size 1
=========     at 0x00000350 in /home/user/tmp/opencv_dir/modules/core/include/opencv2/core/cuda/filters.hpp:97:void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7fdb9c600000 is out of bounds
=========     Device Frame:/home/user/tmp/opencv_dir/modules/core/include/opencv2/core/cuda/filters.hpp:97:void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>) (void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>) : 0x350)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2fe) [0x282a4e]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x6ccc19]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x6ccca7]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x702ff5]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x1451b6]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x105e5e]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x105e8f]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x145430]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x15d267]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x14d032]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x14b4d4]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x10365f]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 (_ZN2cv4cuda10warpAffineERKNS_11_InputArrayERKNS_12_OutputArrayES3_NS_5Size_IiEEiiNS_7Scalar_IdEERNS0_6StreamE + 0x8e9) [0x62ca8]
=========     Host Frame:min_ex [0x2469]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
=========     Host Frame:min_ex [0x21ba]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x391b13]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_core.so.4.1 [0xeea61e]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_core.so.4.1 (_ZN2cv4cuda6Stream17waitForCompletionEv + 0x21) [0x21e61f]
=========     Host Frame:min_ex [0x24a0]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
=========     Host Frame:min_ex [0x21ba]
=========
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found

cuda/filters.hpp:97 is just:

const int y1 = __float2int_rd(y);

from device method of LinearFilter::operator().

@andrejlevkovitch
Copy link
Author

So, I think, problem in integer overflow. See on our transformation matrix - if we change our y-move value to 2147483583., then all will works fine.

@andrejlevkovitch
Copy link
Author

andrejlevkovitch commented Jan 14, 2021

Yes, problem is in integer overflow. Integer overflow is UB on cuda (like in c++ standard), so I just try change string that caused error from const int y1 = __float2int_rd(y); to const unsigned int y1 = __float2uint_rd(y); and after it I can't reproduce my error anymore. So, it looks like using signed integer values in cuda device functions is really bad idea.

@andrejlevkovitch
Copy link
Author

#2712 also not reproduced after previous fix

@andrejlevkovitch
Copy link
Author

Unfortunately I can't reproduce this error in some separate cuda function

@andrejlevkovitch
Copy link
Author

Affine transformation supports three types of interpolation: nearest, linear and cubic - but problem reproduce only with INTER_LINEAR and with not default stream only. This is very strange, because function __float2int_rd uses in cubic filter also.

@thilipwka
Copy link

thilipwka commented Sep 9, 2021

Hey !
I have an error with INTER_LINEAR type also, with NEAREST everything is working fine.
How to solve overflow in matrix in this code? Trying to solve this problem for 2 days , no luck (.
Thank you.

Mat CudaWarp(Mat aImgOriginal, Mat matrix, float w, float h) {
	Mat output;
	cv::cuda::GpuMat gpuOutput;
	cv::cuda::GpuMat  gpuInput;

	gpuInput.upload(aImgOriginal);

	cv::cuda::warpPerspective(gpuInput, gpuOutput, matrix, Point(w, h), INTER_LINEAR, BORDER_CONSTANT,0);

	gpuOutput.download(output);


	return output;

}

@andrejlevkovitch
Copy link
Author

@thilipwka please, see my previous comments. Currently you can't fix this problem, because problem is very deep in cuda code

@thilipwka
Copy link

@thilipwka please, see my previous comments. Currently you can't fix this problem, because problem is very deep in cuda code

thank you

@RamadanAhmed
Copy link

I have the same problem

cv::cuda::remap(inputGPU, inputGPU, M1Gpu, M2Gpu, cv::INTER_LINEAR);

but it solved when I pass different gpu matrix for input and output

cv::cuda::remap(inputGPU, outputGPU, M1Gpu, M2Gpu, cv::INTER_LINEAR);

@Emmanuel-Messulam
Copy link

@andrejlevkovitch Is there some other way of doing a remap? A workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants