Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relaxed i8x16.swizzle #22

Open
ngzhian opened this issue Apr 19, 2021 · 11 comments
Open

relaxed i8x16.swizzle #22

ngzhian opened this issue Apr 19, 2021 · 11 comments
Labels
in-overview Instruction has been added to Overview.md instruction-proposal

Comments

@ngzhian
Copy link
Member

ngzhian commented Apr 19, 2021

  1. What are the instructions being proposed?

relaxed i8x16.swizzle

  1. What are the semantics of these instructions?

relaxed i8x16.swizzle(a, s) selects lanes from a using indices in s, indices in the range [0,15] will select the i-th element of a, the result for any out of range indices is implementation-defined (i.e. if the index is [16-255].

  1. How will these instructions be implemented? Give examples for at least
    x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
    Wasm SIMD.

x86/64, pshufb, out of range indices will return different results:

  • if top bit of index is set, return 0
  • else select the i % 16-th element

ARM/ARM64, vtbl and tbl, out of range indices return 0.

RISC-V V vrgather.vv a, b, out of range return 0 (assuming VEW set to 8, LMUL set to 1, VLEN set to 128, so VLMAX = 16).

Simd128, i8x16.swizzle

  1. How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

Difference between x86/64 and ARM/ARM64

  1. What use cases are there?

Swizzle is quite a common operation, e.g. used in multiple places in meshoptimizer.

@nemequ
Copy link

nemequ commented Apr 19, 2021

On PPC, vpermr (the vec_perm intrinsic) could be used for this. It actually takes two input vectors (plus the index vector) and only the lower 5 bits are used for each index, but if you pass the same vector for both inputs effectively you get the i % 16 behavior.

On z/Arch there is vperm/vec_perm(), which works the same.

@jlb6740
Copy link

jlb6740 commented Apr 27, 2021

This instruction is straightforward and was used as an example motivator for the relaxed-simd proposal itself. One question that comes to mind though is the mechanism for enabling? I think this has been discussed before but how would we be expected to enable specific instructions to be their relaxed version while others remain unrelaxed?

@ngzhian
Copy link
Member Author

ngzhian commented Apr 27, 2021

This instruction is straightforward and was used as an example motivator for the relaxed-simd proposal itself. One question that comes to mind though is the mechanism for enabling? I think this has been discussed before but how would we be expected to enable specific instructions to be their relaxed version while others remain unrelaxed?

We will not enable an existing instruction to be executed in a relaxed manner. The relaxed instruction will be a completely new instruction with different opcode.

@jlb6740
Copy link

jlb6740 commented Apr 27, 2021

Yes, so then you could have a module that has both swizzle and relaxed swizzle instructions? What I am wondering then is if I am writing code in C that is auto-vectorized for example, is there expected to be a way to specify this to the compiler that's targeting Wasm?

@ngzhian
Copy link
Member Author

ngzhian commented Apr 27, 2021

Yes, so then you could have a module that has both swizzle and relaxed swizzle instructions?

Yup that is possible.

is there expected to be a way to specify this to the compiler that's targeting Wasm?

Not at the moment. Maybe we can introduce an Emscripten flag to do this, similar to the -msimd128 currently, that will emit relaxed i8x16.swizzle instead instead of i8x16.swizzle.

@jlb6740
Copy link

jlb6740 commented Apr 27, 2021

Yes, a flag makes sense. In fact I imagine with the proper dependence analysis the compiler could figure out if it is safe to use the relaxed version of an instruction. In fact perhaps it should be criteria or go into the thinking/motivation of proposing a relaxed instruction .. that with a compiler flag giving permission and proper analysis a compiler could determine when it is safe to generate the relaxed version.

@ngzhian
Copy link
Member Author

ngzhian commented Apr 27, 2021

In fact I imagine with the proper dependence analysis the compiler could figure out if it is safe to use the relaxed version of an instruction.

Good idea, but likely not possible in the most general case. E.g. if the swizzle depends on a mutable global/imported value,

@Maratyszcza
Copy link
Collaborator

if I am writing code in C that is auto-vectorized for example

I don't expect that compiler would be able to generate either the normal i8x16.swizzle or a related one from auto-vectorized code.

@ngzhian
Copy link
Member Author

ngzhian commented Nov 1, 2021

Note: vtbl is not available on ARM v8-M MVE AFAICT.

@ngzhian
Copy link
Member Author

ngzhian commented Nov 1, 2021

RISC-V V has vrgather which returns 0 for out of bounds.

@ngzhian
Copy link
Member Author

ngzhian commented Nov 1, 2021

For Power, likely require vperm with shift left on the selection vector (vperm uses bits 3:7 of each byte of selection), then it will select modulo 16.

@ngzhian ngzhian added the in-overview Instruction has been added to Overview.md label Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-overview Instruction has been added to Overview.md instruction-proposal
Projects
None yet
Development

No branches or pull requests

4 participants