Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complex-to-real inverse 2D FFT: incorrect results #128

Closed
nodwellmtt opened this issue Jun 26, 2018 · 16 comments
Closed

complex-to-real inverse 2D FFT: incorrect results #128

nodwellmtt opened this issue Jun 26, 2018 · 16 comments
Assignees

Comments

@nodwellmtt
Copy link

I am unable to get correct results from an inverse 2D Hermitian-to-real FFT. Generally I am inclined to suspect an error on my part, but after much investigation, I can't find any error, and suspect it may be in the library. If anyone is able to point out an error, I would much appreciate it.

See the attached source code, which is as brief an example as I could manage. Data is all in raw pointers to maintain transparency.

This code:

  1. Generates some simple synthetic data in K-space in Hermitian form (with a single non-zero unique value). By Hermitian I mean taking advantage of K(i,j) = K(M-i,N-j)* to reduce storage in the usual way (N/2+1 values along the innermost axis). Call this result A.
  2. Applies a Hermitian-to-real inverse FFT (result B).
  3. Generates the same K-space data, but in full complex form, no reduced storage; applies a complex-to-complex inverse FFT to this data (result C). Result B and result C differ significantly.
  4. Applies a real-to-Hermitian forward FFT to result B. This should be the same as result A, but is significantly different.
  5. The real part of result C is extracted and a real-to-Hermitian forward FFT is applied to it. This does recover result A, indicating that at least step 3 is correct.

Library versions:
hipcc version: 1.5.18151
rocfft version 0.8.1

rocfft_real_inverse.cpp.txt

@ex-rzr
Copy link

ex-rzr commented Oct 8, 2018

I think I encountered the same problem (incorrect results for inverse FFT Hermitian to real).

I can confirm that rocfft_real_inverse.cpp shows the described behavior on rocFFT 0.8.6 and ROCm 1.9.

I've checked rocfft-rider, it fails on real inverse and passes on other transforms:

rocFFT/build$ clients/staging/rocfft-rider -t 3

precision: single
transform type: real inverse
result placement: in-place

input array type: hermitian interleaved
output array type: real

dimensions: 1
lengths: 1024
batch size: 1

input offset: 0
output offset: 0

input strides: 1
output strides: 1
input distance: 513
output distance: 1026

scale: 1



		Rider Test *****FAIL*****

@bragadeesh
Copy link
Contributor

I am able to reproduce this; the host code in the library appears ok, it generates the correct parameters for the kernel, bug has to be in one the kernels.

@ex-rzr
Copy link

ex-rzr commented Oct 10, 2018

I have a fix for 2D and 3D: develop...ex-rzr:fix_hermitian_inverse

The resulting matrix was correct only for 1D (only x index was computed for conjugates, y and z were not changed).

It seems that rocfft-rider -t 3 failures are not caused by this issue only. IIRC H-to-real check in rocfft-rider ignores scaling by the number of values (compared to checks of other transforms).

I'm not sure that my fix is completely correct (and it's probably not optimal), but rocfft_real_inverse.cpp produces correct results (2D) as well as my code produces correct results for 3D transform.

Another question is why all rocFFT tests pass even without the fix. I haven't checked them very deeply but there are tests of Hermitian-to-real for 1D and 2D that compare results against FFTW.

@bragadeesh
Copy link
Contributor

@ex-rzr
I have checked in a fix in the develop branch; yes the issue was the inverse Hermitian was not handling the multi-dimensional cases correctly. When I get more time I have to look at the tests, which may have some bugs. Also, these solutions are intermediate until performant algorithms without copies for real FFTs are implemented.

Good to see you may have been working on this as well.

@bragadeesh
Copy link
Contributor

@nodwellmtt
its probably not a good idea to start crafting input in the Hermitian space (especially for multi-dimensional transforms) as the data have to be carefully selected so as to not violate the FFT properties involving real data. It is best to start the random test inputs from the real signal space.

Complex FFTs have no special restrictions, hence it is not always valid to compare complex and real FFTs

@nodwellmtt
Copy link
Author

@bragadeesh
In this case the k-space function was Hermetian, and the full complex and compressed Hermitian representations were in fact the same function (easy to verify, since there was a single non-zero value in the Hermitian representation), so the regular inverse FFT and Hermitian-to-real FFTs must give the same result (and do using other FFT libraries), other than the detail than one returns real values directly, and the other returns complex values that all have zero imaginary part. It was the simplest example that I could construct that demonstrated the error in rocfft. It is true that typical applications start (and end) in real-space (eg convolution), but such an example would have required more than a single transform, thus leaving ambiguous which was the source of error.

There is nothing particularly difficult about the multi-dimensional case, except knowing how the data are laid out, which is a matter of convention.

In any case I am glad this issue is being addressed.

@bragadeesh
Copy link
Contributor

@nodwellmtt
I haven't checked your code in detail, only briefly looked at it. Can you try the latest develop branch and let us know if you are still seeing error

@nodwellmtt
Copy link
Author

Will do! I should have a chance to get to that this afternoon or tomorrow.

@nodwellmtt
Copy link
Author

@bragadeesh
Our internal test suite now passes. The following two tests which formerly failed pass with the current develop branch:

  1. Comparison of a small 2D inverse Hermitian-to-real FFT with an inverse DFT.
    2, Verification on a large 2D data set that the 2D inverse Hermitian-to-real FFT is in fact the inverse of the 2D real-to-Hermitian forward FFT.

@bragadeesh
Copy link
Contributor

@nodwellmtt
thanks for the update, good to hear that

@ex-rzr
Copy link

ex-rzr commented Oct 26, 2018

@bragadeesh
I checked your fix and it turned out that hermitian-to-real transform returns incorrect results for 3D.
But luckily I've found this tiny error quickly (also memory leak removed):

diff --git a/library/src/device/complex2real.cpp b/library/src/device/complex2real.cpp
index 56e6806..3b64dce 100644
--- a/library/src/device/complex2real.cpp
+++ b/library/src/device/complex2real.cpp
@@ -244,7 +244,7 @@ void hermitian2complex(const void *data_p, void *back_p)
         std::cout << "Error: dimension larger than 3, which is not handled" << std::endl;
 
     size_t dim_1 = 1, dim_2 = 1;
-    if(data->node->length.size() == 2)
+    if(data->node->length.size() >= 2)
         dim_1 = data->node->length[1];
     if(data->node->length.size() == 3)
         dim_2 = data->node->length[2];
@@ -258,10 +258,10 @@ void hermitian2complex(const void *data_p, void *back_p)
 
     hipStream_t rocfft_stream = data->rocfft_stream; 
 
-    float2* tmp; tmp = (float2*)malloc(sizeof(float2)*input_distance*batch);
+    /*float2* tmp; tmp = (float2*)malloc(sizeof(float2)*input_distance*batch);
     hipMemcpy(tmp, input_buffer, sizeof(float2)*input_distance*batch, hipMemcpyDeviceToHost);
 
-    /*printf("herm size %d, dim0 %d, dim1 %d, dim2 %d\n", hermitian_size, dim_0, dim_1, dim_2);
+    printf("herm size %d, dim0 %d, dim1 %d, dim2 %d\n", hermitian_size, dim_0, dim_1, dim_2);
     printf("input_stride %d output_stride %d input_distance %d output_distance %d\n", input_stride, output_stride, input_distance, output_distance);
 
     //for(size_t j=0;j<data->node->length[1]; j++)

bragadeesh added a commit to bragadeesh/rocFFT that referenced this issue Oct 26, 2018
@bragadeesh
Copy link
Contributor

thanks @ex-rzr

bragadeesh added a commit that referenced this issue Oct 26, 2018
@malcolmroberts
Copy link
Contributor

@bragadeesh , has this been solved?

@malcolmroberts malcolmroberts self-assigned this Sep 9, 2019
@bragadeesh
Copy link
Contributor

closing this old one

@antholzer
Copy link

I am still seeing this issue e.g. if I modify the sample from the docs like this (also fixed error calculation, if I do not change the array the inverse is correct):

--- docs/samples/real2complex_2d.cpp	2019-11-18 17:42:51.381516325 +0100
+++ docs/samples/real2complex_2d.cpp	2019-11-21 17:46:01.822548764 +0100
@@ -53,6 +53,7 @@
             cx[i * Nystride + j] = i + j;
         }
     }
+    cx[0] = 1.23; cx[1+Nystride] = 0.54; cx[4+Nystride] = -1.67;
     for(size_t i = 0; i < Nx; i++)
     {
         for(size_t j = 0; j < Nystride; j++)
@@ -185,7 +186,8 @@
     {
         for(size_t j = 0; j < Ny; j++)
         {
-            float diff = std::abs(backx[i] * overN - cx[i]);
+            const size_t pos = i * Nystride + j;
+            float diff = std::abs(backx[pos] * overN - cx[pos]);
             if(diff > error)
             {
                 error = diff;

I get Maximum error: 4.83973 (forward transformation is correct).
I am using the docker rocm image (with rocm v2.9) with kernel 5.3.10 and rocFFT v0.9.7. GPU is gfx803.

@malcolmroberts
Copy link
Contributor

This is a bug that, unfortunately, got in to 2.9. The offending commit is 0453c45, which was reverted in 3b0813f. We've fixed the algorithmic error, which was added in 0b30436; this will make into 3.0.

feizheng10 pushed a commit that referenced this issue May 5, 2020
* Add unit test to ensure multi-threaded logs don't get garbled

* Bump library to C++14 internally, as thread-safe logging requires it

* Emit cleanup trace message before closing the log file

* Import thread-safe logging code from rocBLAS.

This introduces a new rocfft_ostream type that buffers data for a
per-output-file worker thread, to prevent messages from being garbled when
multiple threads all log to the same destination.

The rocfft_ostream objects must be per-thread (even if they would log to the
same place), so in rocfft_setup we now merely open file descriptors to the
output files, and construct thread-local rocfft_ostream objects when the first
message is emitted from each thread.

* Clean up log files and layer mode during rocfft_cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants