Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed-radix NTT support all orderings #371

Merged
merged 5 commits into from
Feb 13, 2024
Merged

Conversation

yshekel
Copy link
Collaborator

@yshekel yshekel commented Feb 11, 2024

No description provided.

@yshekel yshekel changed the base branch from main to dev February 11, 2024 12:25
@yshekel yshekel force-pushed the yshekel/orderings_mixed_radix_ntt branch 2 times, most recently from 41c5c9e to 7c450d4 Compare February 11, 2024 12:32
@yshekel yshekel marked this pull request as ready for review February 11, 2024 12:33
@yshekel yshekel force-pushed the yshekel/orderings_mixed_radix_ntt branch 2 times, most recently from 5c6da0c to 3aecf8e Compare February 11, 2024 15:15
@yshekel yshekel force-pushed the yshekel/orderings_mixed_radix_ntt branch from 3aecf8e to c7637d1 Compare February 12, 2024 16:32
Copy link
Contributor

@DmytroTym DmytroTym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I have with this PR is that it's possible to force old radix-2 algorithm with an ordering like NM or MN. It might seem like test_ntt_batch checks correctness of it but it only compares batch and non-batch outputs of the same algorithm. While the outputs for radix-2 are consistent, they are not correct. Correct me if I'm wrong here. In terms of how to deal with NM and MN orderings in radix-2 NTT - I'm fine with a runtime error tbh cos it doesn't make a lot of sense.

About Rust NTT tests - they now take a pretty long time. One idea might be to put looping over the two algorithms as deeply as possible instead of at the very beginning of the test. This should reduce the CPU side work which is where the vast majority of test time goes into. Also it should make diffs much smaller and nicer to look at.

wrappers/rust/icicle-core/src/ntt/tests.rs Outdated Show resolved Hide resolved
wrappers/rust/icicle-core/src/ntt/mod.rs Outdated Show resolved Hide resolved
wrappers/rust/icicle-core/src/ntt/mod.rs Outdated Show resolved Hide resolved
icicle/appUtils/ntt/tests/verification.cu Outdated Show resolved Hide resolved
icicle/appUtils/ntt/ntt.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@HadarIngonyama HadarIngonyama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all in all looks fine, just wanted to ask - did you make sure that the changes to the reordering kernels didn't harm the performance of the simple cases (I don't think the should, but just to verify)

wrappers/rust/icicle-core/src/ntt/mod.rs Outdated Show resolved Hide resolved
@yshekel
Copy link
Collaborator Author

yshekel commented Feb 13, 2024

One issue I have with this PR is that it's possible to force old radix-2 algorithm with an ordering like NM or MN. It might seem like test_ntt_batch checks correctness of it but it only compares batch and non-batch outputs of the same algorithm. While the outputs for radix-2 are consistent, they are not correct. Correct me if I'm wrong here. In terms of how to deal with NM and MN orderings in radix-2 NTT - I'm fine with a runtime error tbh cos it doesn't make a lot of sense.

About Rust NTT tests - they now take a pretty long time. One idea might be to put looping over the two algorithms as deeply as possible instead of at the very beginning of the test. This should reduce the CPU side work which is where the vast majority of test time goes into. Also it should make diffs much smaller and nicer to look at.

One issue I have with this PR is that it's possible to force old radix-2 algorithm with an ordering like NM or MN. It might seem like test_ntt_batch checks correctness of it but it only compares batch and non-batch outputs of the same algorithm. While the outputs for radix-2 are consistent, they are not correct. Correct me if I'm wrong here. In terms of how to deal with NM and MN orderings in radix-2 NTT - I'm fine with a runtime error tbh cos it doesn't make a lot of sense.

About Rust NTT tests - they now take a pretty long time. One idea might be to put looping over the two algorithms as deeply as possible instead of at the very beginning of the test. This should reduce the CPU side work which is where the vast majority of test time goes into. Also it should make diffs much smaller and nicer to look at.

@DmytroTym @HadarIngonyama
Regarding NM, MN for radix-2: The way I am thinking about it is as follows: M (=Mixed) is by definition the order of the elements that a DIF NTT/INTT would produce given Natural inputs. DIT NTT/INTT of the same size expects this same order to output the Natural order back. Different NTT sizes would mix the elements differently but consistently per size. For element-wise ops like multiplication this is useful but again the only thing guaranteed here is consistency per size.
Therefore I am thinking of radix-2 R as a special case of M which means that NR an NM are the same for radix-2 (and same for RN and MN). I don't see a reason why this is invalid to do or should throw an assertion. Technically a radix2 NTT can be plugged to the mixed-radix algorithm and the top-level table would be all ones in the logn size, such that digit-reversal is exactly bit-reversal and so we have logn stages, each composed of N/2 NTTS of size 2.

You are right that the batch test is only testing that batch computation is equivalent to a simple loop over the elements with btach=1 for each.

User who want to specifically get R should specify R, not M and this is a stronger request. For radix-2 it's free but for mixed-radix alg it's more expensive and therefore if not required (like for poly multiplication) then M is good enough.

Regarding rust tests, TBH I did not think about it too much and thought that most of the time is building but yes you are right I will try to nest the loop and avoid reallocation of arrays etc. The small downside of that is that in-place computations (when/if we add them) may fail or behave different then expected although since we randomize inputs it's maybe not really a problem.

@yshekel
Copy link
Collaborator Author

yshekel commented Feb 13, 2024

all in all looks fine, just wanted to ask - did you make sure that the changes to the reordering kernels didn't harm the performance of the simple cases (I don't think the should, but just to verify)

@HadarIngonyama do you mean that NN for example is not slower now? I did not verify since I did not think there should be any difference at all but I will verify just to be 100% sure.

@yshekel yshekel force-pushed the yshekel/orderings_mixed_radix_ntt branch from 920f112 to 98682f7 Compare February 13, 2024 12:19
@yshekel yshekel force-pushed the yshekel/orderings_mixed_radix_ntt branch from 98682f7 to 7b72282 Compare February 13, 2024 12:23
@yshekel yshekel merged commit a02459c into dev Feb 13, 2024
14 checks passed
@yshekel yshekel deleted the yshekel/orderings_mixed_radix_ntt branch February 13, 2024 13:49
@DmytroTym DmytroTym mentioned this pull request Feb 15, 2024
DmytroTym added a commit that referenced this pull request Feb 15, 2024
## Contents of this release

[FEAT]: support for multi-device execution:
#356
[FEAT]: full support for new mixed-radix NTT:
#367,
#368 and
#371
[FEAT]: examples for Poseidon hash and tree builder based on it
(currently only on C++ side):
#375
[PERF]: MSM performance upgrades & zero point handling:
#372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants