exec: initial commit of execgen tool #31554

solongordon · 2018-10-17T19:21:31Z

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

cockroach-teamcity · 2018-10-17T19:21:37Z

This change is

jordanlewis

Great!

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/exec/.gitignore, line 1 at r1 (raw file):

rowstovec.og.go

I'd just make this *.og.go so we don't have to keep editing it

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):

// EncDatumRowsToColVec converts one column from EncDatumRows to a column
// vector.

Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):

	nRows := uint16(len(rows))
	{{range .}}
	if columnType.SemanticType == sqlbase.{{.SemanticType}} && columnType.Width == {{.Width}} {

I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.

I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.

Finally, I think the code will be more efficient with a switch on SemanticType the way it's currently implemented in columnarizer.go. That's because switch does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.

tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 56 at r1 (raw file):

				vec.SetNull(i)
			} else {
				{{if .HasSetMethod}}

Nice - this is a lot less gross than what I was fearing, now that I see it written down. I suppose a similar approach could probably go well for the infix operators too - just a simple if.

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):

}

var columnConversions = []columnConversion{

I would consider making this slice generated from a map in an init function.

The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name) that gives you back "BOOL" for ColumnType_Bool. For the ExecType, I would consider using FromColumnType on the key of the map to get the actual value of the execType, and then adding a new map to types.go similar to ColumnType_SemanticType_name that gives you back the actual variable name.

For HasSetMethod, you can use the reflect package. Do newMemColumn on the type, then reflect.TypeOf the inner column of the type, then ask it if it reflect.Implements a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }).

For DatumToPhysicalFn I'm sure there's something you can do too.

Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.

jordanlewis · 2018-10-18T02:30:44Z

I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though.

Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code.

I'm envisioning a registerGenerator function that each operator will call in its init function that adds a generator to a list of things that execgen can generate. That will minimize the number of things that has to change each time a new operator template is added - and I think there will be lots of these templates over time.

Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. int32 for exec.Int32). I'm hoping the columnConversions slice or something like it can grow over time into a generic all-types slice that every operator can iterate over in the same way to generate all of its variants. Not sure if it'll work out in practice, though.

solongordon

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/exec/.gitignore, line 1 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'd just make this *.og.go so we don't have to keep editing it

Done.

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).

Done.

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.

I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.

Finally, I think the code will be more efficient with a switch on SemanticType the way it's currently implemented in columnarizer.go. That's because switch does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.

tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do

Yes, I meant to leave a TODO here. I'm doing the simple thing for now and punting on making it more efficient until we handle more types. I'll add a comment.

pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I would consider making this slice generated from a map in an init function.

The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name) that gives you back "BOOL" for ColumnType_Bool. For the ExecType, I would consider using FromColumnType on the key of the map to get the actual value of the execType, and then adding a new map to types.go similar to ColumnType_SemanticType_name that gives you back the actual variable name.

For HasSetMethod, you can use the reflect package. Do newMemColumn on the type, then reflect.TypeOf the inner column of the type, then ask it if it reflect.Implements a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }).

For DatumToPhysicalFn I'm sure there's something you can do too.

Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.

Done. I'm saving the last two for late. (I fiddled with suing reflection for HasSetMethod but it wasn't feeling worth the time at this point. And I think it would require exporting all the MemColumn stuff.)

Also I put this logic in the generator function rather than an init() for now since no one else needs the columnConversions so far.

benesch

Reviewable status: complete! 0 of 0 LGTMs obtained

Makefile, line 1315 at r2 (raw file):

pkg/sql/exec/rowstovec.og.go: bin/execgen
	execgen -out $@ rowstovec

nit: if you go through CI again, move this above the optgen variable declarations

solongordon

I went ahead and added the registerGenerator functionality. Also changed it to pick the generator based on the specified filename, which reduces how much we have to touch in both execgen/main.go and Makefile when we add new generators.

Reviewable status: complete! 0 of 0 LGTMs obtained

jordanlewis

Reviewable status: complete! 0 of 0 LGTMs obtained

Execgen will be our tool for generating templated code necessary for columnarized execution. So far it only generates the EncDatumRowsToColVec function, which is used by the columnarizer to convert a RowSource into a columnarized Operator. Release note: None

solongordon · 2018-10-22T19:46:04Z

bors r+

31554: exec: initial commit of execgen tool r=solongordon a=solongordon Execgen will be our tool for generating templated code necessary for columnarized execution. So far it only generates the EncDatumRowsToColVec function, which is used by the columnarizer to convert a RowSource into a columnarized Operator. Release note: None 31610: sql: fix pg_catalog.pg_constraint's confkey column r=BramGruneir a=BramGruneir Prior to this patch, all columns in the index were included instead of only the ones being used in the foreign key reference. Fixes #31545. Release note (bug fix): Fix pg_catalog.pg_constraint's confkey column from including columns that were not involved in the foreign key reference. 31689: storage: pick up fix for Raft uncommitted entry size tracking r=benesch a=tschottdorf Waiting for the upstream PR etcd-io/etcd#10199 to merge, but this is going to be what the result will look like. ---- The tracking of the uncommitted portion of the log had a bug where it wasn't releasing everything as it should've. As a result, over time, all proposals would be dropped. We're hitting this way earlier in our import tests, which propose large proposals. As an intentional implementation detail, a proposal that itself exceeds the max uncommitted log size is allowed only if the uncommitted log is empty. Due to the leak, we weren't ever hitting this case and so AddSSTable commands were often dropped indefinitely. Fixes #31184. Fixes #28693. Fixes #31642. Optimistically: Fixes #31675. Fixes #31654. Fixes #31446. Release note: None Co-authored-by: Solon Gordon <solon@cockroachlabs.com> Co-authored-by: Bram Gruneir <bram@cockroachlabs.com> Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>

craig · 2018-10-22T21:53:14Z

Build succeeded

GitHub CI (Cockroach)

31705: exec: get rid of Bools and Bytes; add some testing utils r=jordanlewis a=jordanlewis opTestInput and opTestOutput allow testing operators without having to construct ColBatches by hand, with a natural syntax inspired by ProcessorTestCase. Also, get rid of `Bools` in favor of `[]bool`, and `Bytes` in favor of `[][]byte`, as it's up to 4x slower at least on the Distinct benchmark. Split from #31693. Based on #31554. Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

31693: exec: implement int64 sorted distinct r=jordanlewis a=jordanlewis This operator runs sorted distinct over n columns, populating the selection vector with the indices of the tuples that are distinct in the input. This will be execgen templated later. ``` BenchmarkSortedDistinct-8 500000 3924 ns/op 6262.34 MB/s ``` Few other changes here: - use []bool, not Bools, for boolean columns. Turns out the Bools idea was really slow - 3-4x worse on the benchmark. - added a testing util library that uses a little Go reflection to make writing testcases easier. - added the boolean zero operator All based on top of #31554. Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

solongordon requested review from jordanlewis and a team October 17, 2018 19:21

jordanlewis requested changes Oct 17, 2018

View reviewed changes

solongordon commented Oct 18, 2018

View reviewed changes

solongordon force-pushed the initial-execgen branch from 2c55c37 to 3d2038d Compare October 18, 2018 18:42

benesch reviewed Oct 18, 2018

View reviewed changes

solongordon force-pushed the initial-execgen branch from 3d2038d to 22612b6 Compare October 18, 2018 20:47

solongordon commented Oct 18, 2018

View reviewed changes

jordanlewis approved these changes Oct 18, 2018

View reviewed changes

solongordon force-pushed the initial-execgen branch from 22612b6 to d6871fe Compare October 22, 2018 12:16

jordanlewis mentioned this pull request Oct 22, 2018

exec: implement int64 sorted distinct #31693

Merged

solongordon force-pushed the initial-execgen branch from d6871fe to 80880c9 Compare October 22, 2018 14:17

jordanlewis mentioned this pull request Oct 22, 2018

exec: get rid of Bools and Bytes; add some testing utils #31705

Merged

solongordon force-pushed the initial-execgen branch from 80880c9 to c106c32 Compare October 22, 2018 19:06

craig bot merged commit c106c32 into cockroachdb:master Oct 22, 2018

solongordon deleted the initial-execgen branch October 23, 2018 12:00

solongordon mentioned this pull request Nov 13, 2018

exec: implement execgen for columnarizer functions and arithmetic select/project ops #31518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exec: initial commit of execgen tool #31554

exec: initial commit of execgen tool #31554

solongordon commented Oct 17, 2018

cockroach-teamcity commented Oct 17, 2018

jordanlewis left a comment

jordanlewis commented Oct 18, 2018

solongordon left a comment

benesch left a comment

solongordon left a comment

jordanlewis left a comment

solongordon commented Oct 22, 2018

craig bot commented Oct 22, 2018

exec: initial commit of execgen tool #31554

exec: initial commit of execgen tool #31554

Conversation

solongordon commented Oct 17, 2018

cockroach-teamcity commented Oct 17, 2018

jordanlewis left a comment

Choose a reason for hiding this comment

jordanlewis commented Oct 18, 2018

solongordon left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

solongordon left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

solongordon commented Oct 22, 2018

craig bot commented Oct 22, 2018

Build succeeded