Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

instruction encoding #46

Closed
tlively opened this issue Oct 23, 2018 · 9 comments
Closed

instruction encoding #46

tlively opened this issue Oct 23, 2018 · 9 comments

Comments

@tlively
Copy link
Member

tlively commented Oct 23, 2018

The proposal says, "All SIMD instructions are encoded as a 0xfd prefix byte followed by a SIMD-specific opcode in LEB128 format," but this is not consistent with the instruction encoding in other proposals. For example, the non-trapping float-to-int and threads proposals both encode their instructions as a single prefix byte followed by a single byte identifying the instruction.

The only reason I see to use LEB128 for SIMD instructions is to avoid running out of opcodes. But we don't seem to be in danger of running of opcodes for 128-bit SIMD, and if we were ever going to add many more SIMD instructions we could always use more prefixes as well.

Since this divergence from the norm seems unnecessary, I propose we change the encoding of SIMD instructions to be just two bytes, as in other proposals. Is there any context for this discussion that I am missing?

Adopting this change would obviate LLVM this LLVM bug https://bugs.llvm.org/show_bug.cgi?id=39272.

@binji
Copy link
Member

binji commented Oct 23, 2018

We discussed this in the May 2017 CG meeting, and decided on a single prefix byte followed by LEB128 code. We can revisit this, but generally we only do so if there is new information.

If the other proposals are using a single byte followed by a single byte for opcode, then they should be updated to allow for a LEB128 encoding. We can generally encode this using a single byte, but we should allow for "long" encodings as well.

@tlively
Copy link
Member Author

tlively commented Oct 23, 2018

It still seems weird to me, since all of the MVP opcodes are bytes rather than LEB128 values. SIMD is unique among proposed extensions in that it has enough opcodes that using LEB128 to encode them will require more space than just using two bytes.

@kmiller68
Copy link

How many SIMD instructions do we actually expect to be in a program? I would guess it makes up less than .1% of the bytes of the module? Maybe I'm wildly off here though. If it's only a few bytes total one way or the other it seems kinda silly worry about it for space reasons.

@sunfishcode
Copy link
Member

My memory of the discussion @binji mentioned above was that "prefix plus LEB128" was meant to be a rule that all prefixes could follow, for the sake of consistency, rather than having each prefix have its own opcode encoding scheme.

Fortunately, I think the LLVM issue is fixable (and have commented in the bug).

@tlively
Copy link
Member Author

tlively commented Oct 24, 2018

I agree that each prefix should have the same opcode encoding scheme. I should have been clearer that my point about LEB128 taking more space for SIMD was less about space concerns and more to point out that SIMD is the only proposal for which the choice of encoding (LEB128 vs bytes) makes a material difference. My understanding is that LEB128 is generally used to make the encoding of small values small while making the encoding of large value possible, but this power is not necessary for opcodes. LEB128 is unnecessary complexity in this context, and the fact that multiple proposal texts and tools use bytes instead of LEB128 suggests that byte encoding is both sufficient and more natural.

But as everyone has pointed out, fixing the proposals and tools to use LEB128 is not too burdensome, so if there's no interest in revisiting this, that's fine with me.

@tlively
Copy link
Member Author

tlively commented Oct 31, 2018

@binji re: new information, It looks like earlier in the notes you linked the idea of LEB128 opcode encoding was first brought up for the following reason.

"The current SIMD proposal defines 193 new operations, and it is likely that more SIMD operations will be added in the future. We need an extensible way of encoding the opcodes for these operations."

We now know that the SIMD proposal will definitely not have that many operations.

@sunfishcode
Copy link
Member

In the future, one possible path for SIMD is to evolve away from being an "intersection" API that covers roughly what's good on all platforms to a "union" API that includes more features (possibly by providing a way to query the engine to see what's fast). If that happens, it could add hundreds more opcodes.

@binji
Copy link
Member

binji commented Nov 1, 2018

There's also the possibility of adding larger SIMD widths, which would increase the instruction count too. And if we limit the number of instructions that can be specified after a prefix, we'll need to use more prefixes, or perhaps an ad-hoc mechanism for extending the instruction space.

@tlively
Copy link
Member Author

tlively commented Nov 2, 2018

Alright, I'm convinced. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants