You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To multiply two size-2 ciphertext, (c0, c1) and (d0, d1), we compute three values c0*d0, c0*d1 + c1*d0 and c1*d1.
Current Status
Currently, SEAL uses Karatsuba to compute the second value as c0*d1 + c1*d0 = (c0 + d0) * (c1 + d1) - (c0*d0 + c1*d1)
which save one multiplication as the cost of two more additions and one more subtraction, since the first and the third values are computed after all.
Observation
However, these multiplication, addition, subtraction are done over the finite field, i.e., mod p. I found the one more addition and subtraction of the Karatsuba method is somehow expensive, because it increases the mis-prediction of CPU instructions.
Simpler is Better
If we directly compute the second value c0 * d1 + c1 * d0, we actually only need one modulo reduction. Because the two multiplications can be done with lazy reduction.
I have do some experiences with NFLlib, can found that the naive method can be $10% -- 50%$ faster.
The text was updated successfully, but these errors were encountered:
In the current bfv_multiply implementation the Karatsuba multiplication has nearly no effect on performance because almost all running time is in NTT. Therefore, as far as I can see, the most meaningful change would be to remove the unnecessary Karatsuba branch and just go with the generic approach every time.
Description
To multiply two size-2 ciphertext,
(c0, c1)
and(d0, d1)
, we compute three valuesc0*d0
,c0*d1 + c1*d0
andc1*d1
.Current Status
c0*d1 + c1*d0 = (c0 + d0) * (c1 + d1) - (c0*d0 + c1*d1)
which save one multiplication as the cost of two more additions and one more subtraction, since the first and the third values are computed after all.
Observation
mod p
. I found the one more addition and subtraction of the Karatsuba method issomehow
expensive, because it increases the mis-prediction of CPU instructions.Simpler is Better
c0 * d1 + c1 * d0
, we actually only need one modulo reduction. Because the two multiplications can be done with lazy reduction.The text was updated successfully, but these errors were encountered: