instruction encoding #46

tlively · 2018-10-23T01:35:08Z

The proposal says, "All SIMD instructions are encoded as a 0xfd prefix byte followed by a SIMD-specific opcode in LEB128 format," but this is not consistent with the instruction encoding in other proposals. For example, the non-trapping float-to-int and threads proposals both encode their instructions as a single prefix byte followed by a single byte identifying the instruction.

The only reason I see to use LEB128 for SIMD instructions is to avoid running out of opcodes. But we don't seem to be in danger of running of opcodes for 128-bit SIMD, and if we were ever going to add many more SIMD instructions we could always use more prefixes as well.

Since this divergence from the norm seems unnecessary, I propose we change the encoding of SIMD instructions to be just two bytes, as in other proposals. Is there any context for this discussion that I am missing?

Adopting this change would obviate LLVM this LLVM bug https://bugs.llvm.org/show_bug.cgi?id=39272.

binji · 2018-10-23T07:18:20Z

We discussed this in the May 2017 CG meeting, and decided on a single prefix byte followed by LEB128 code. We can revisit this, but generally we only do so if there is new information.

If the other proposals are using a single byte followed by a single byte for opcode, then they should be updated to allow for a LEB128 encoding. We can generally encode this using a single byte, but we should allow for "long" encodings as well.

tlively · 2018-10-23T21:00:52Z

It still seems weird to me, since all of the MVP opcodes are bytes rather than LEB128 values. SIMD is unique among proposed extensions in that it has enough opcodes that using LEB128 to encode them will require more space than just using two bytes.

kmiller68 · 2018-10-23T21:10:40Z

How many SIMD instructions do we actually expect to be in a program? I would guess it makes up less than .1% of the bytes of the module? Maybe I'm wildly off here though. If it's only a few bytes total one way or the other it seems kinda silly worry about it for space reasons.

sunfishcode · 2018-10-23T21:30:17Z

My memory of the discussion @binji mentioned above was that "prefix plus LEB128" was meant to be a rule that all prefixes could follow, for the sake of consistency, rather than having each prefix have its own opcode encoding scheme.

Fortunately, I think the LLVM issue is fixable (and have commented in the bug).

tlively · 2018-10-24T19:59:43Z

I agree that each prefix should have the same opcode encoding scheme. I should have been clearer that my point about LEB128 taking more space for SIMD was less about space concerns and more to point out that SIMD is the only proposal for which the choice of encoding (LEB128 vs bytes) makes a material difference. My understanding is that LEB128 is generally used to make the encoding of small values small while making the encoding of large value possible, but this power is not necessary for opcodes. LEB128 is unnecessary complexity in this context, and the fact that multiple proposal texts and tools use bytes instead of LEB128 suggests that byte encoding is both sufficient and more natural.

But as everyone has pointed out, fixing the proposals and tools to use LEB128 is not too burdensome, so if there's no interest in revisiting this, that's fine with me.

tlively · 2018-10-31T01:13:44Z

@binji re: new information, It looks like earlier in the notes you linked the idea of LEB128 opcode encoding was first brought up for the following reason.

"The current SIMD proposal defines 193 new operations, and it is likely that more SIMD operations will be added in the future. We need an extensible way of encoding the opcodes for these operations."

We now know that the SIMD proposal will definitely not have that many operations.

sunfishcode · 2018-10-31T01:50:47Z

In the future, one possible path for SIMD is to evolve away from being an "intersection" API that covers roughly what's good on all platforms to a "union" API that includes more features (possibly by providing a way to query the engine to see what's fast). If that happens, it could add hundreds more opcodes.

binji · 2018-11-01T16:57:58Z

There's also the possibility of adding larger SIMD widths, which would increase the instruction count too. And if we limit the number of instructions that can be specified after a prefix, we'll need to use more prefixes, or perhaps an ad-hoc mechanism for extending the instruction space.

tlively · 2018-11-02T00:50:16Z

Alright, I'm convinced. Thanks!

tlively closed this as completed Nov 2, 2018

tlively mentioned this issue Apr 2, 2020

Opcode renumbering #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instruction encoding #46

instruction encoding #46

tlively commented Oct 23, 2018

binji commented Oct 23, 2018

tlively commented Oct 23, 2018

kmiller68 commented Oct 23, 2018

sunfishcode commented Oct 23, 2018

tlively commented Oct 24, 2018

tlively commented Oct 31, 2018

sunfishcode commented Oct 31, 2018

binji commented Nov 1, 2018

tlively commented Nov 2, 2018

instruction encoding #46

instruction encoding #46

Comments

tlively commented Oct 23, 2018

binji commented Oct 23, 2018

tlively commented Oct 23, 2018

kmiller68 commented Oct 23, 2018

sunfishcode commented Oct 23, 2018

tlively commented Oct 24, 2018

tlively commented Oct 31, 2018

sunfishcode commented Oct 31, 2018

binji commented Nov 1, 2018

tlively commented Nov 2, 2018