-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: initial commit of execgen tool #31554
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/exec/.gitignore, line 1 at r1 (raw file):
rowstovec.og.go
I'd just make this *.og.go so we don't have to keep editing it
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):
// EncDatumRowsToColVec converts one column from EncDatumRows to a column // vector.
Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):
nRows := uint16(len(rows)) {{range .}} if columnType.SemanticType == sqlbase.{{.SemanticType}} && columnType.Width == {{.Width}} {
I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.
I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.
Finally, I think the code will be more efficient with a switch
on SemanticType
the way it's currently implemented in columnarizer.go
. That's because switch
does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.
tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 56 at r1 (raw file):
vec.SetNull(i) } else { {{if .HasSetMethod}}
Nice - this is a lot less gross than what I was fearing, now that I see it written down. I suppose a similar approach could probably go well for the infix operators too - just a simple if.
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):
} var columnConversions = []columnConversion{
I would consider making this slice generated from a map in an init
function.
The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL
. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name
) that gives you back "BOOL"
for ColumnType_Bool
. For the ExecType
, I would consider using FromColumnType
on the key of the map to get the actual value of the execType, and then adding a new map to types.go
similar to ColumnType_SemanticType_name
that gives you back the actual variable name.
For HasSetMethod
, you can use the reflect package. Do newMemColumn
on the type, then reflect.TypeOf
the inner column of the type, then ask it if it reflect.Implements
a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }
).
For DatumToPhysicalFn
I'm sure there's something you can do too.
Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.
I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though. Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code. I'm envisioning a Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/exec/.gitignore, line 1 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I'd just make this *.og.go so we don't have to keep editing it
Done.
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).
Done.
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.
I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.
Finally, I think the code will be more efficient with a
switch
onSemanticType
the way it's currently implemented incolumnarizer.go
. That's becauseswitch
does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do
Yes, I meant to leave a TODO here. I'm doing the simple thing for now and punting on making it more efficient until we handle more types. I'll add a comment.
pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I would consider making this slice generated from a map in an
init
function.The key of the map can be the actual semantic type value - e.g.
ColumnType_BOOL
. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name
) that gives you back"BOOL"
forColumnType_Bool
. For theExecType
, I would consider usingFromColumnType
on the key of the map to get the actual value of the execType, and then adding a new map totypes.go
similar toColumnType_SemanticType_name
that gives you back the actual variable name.For
HasSetMethod
, you can use the reflect package. DonewMemColumn
on the type, thenreflect.TypeOf
the inner column of the type, then ask it if itreflect.Implements
a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }
).For
DatumToPhysicalFn
I'm sure there's something you can do too.Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.
Done. I'm saving the last two for late. (I fiddled with suing reflection for HasSetMethod
but it wasn't feeling worth the time at this point. And I think it would require exporting all the MemColumn
stuff.)
Also I put this logic in the generator function rather than an init()
for now since no one else needs the columnConversions
so far.
2c55c37
to
3d2038d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
Makefile, line 1315 at r2 (raw file):
pkg/sql/exec/rowstovec.og.go: bin/execgen execgen -out $@ rowstovec
nit: if you go through CI again, move this above the optgen variable declarations
3d2038d
to
22612b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added the registerGenerator
functionality. Also changed it to pick the generator based on the specified filename, which reduces how much we have to touch in both execgen/main.go
and Makefile
when we add new generators.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
22612b6
to
d6871fe
Compare
d6871fe
to
80880c9
Compare
Execgen will be our tool for generating templated code necessary for columnarized execution. So far it only generates the EncDatumRowsToColVec function, which is used by the columnarizer to convert a RowSource into a columnarized Operator. Release note: None
80880c9
to
c106c32
Compare
bors r+ |
31554: exec: initial commit of execgen tool r=solongordon a=solongordon Execgen will be our tool for generating templated code necessary for columnarized execution. So far it only generates the EncDatumRowsToColVec function, which is used by the columnarizer to convert a RowSource into a columnarized Operator. Release note: None 31610: sql: fix pg_catalog.pg_constraint's confkey column r=BramGruneir a=BramGruneir Prior to this patch, all columns in the index were included instead of only the ones being used in the foreign key reference. Fixes #31545. Release note (bug fix): Fix pg_catalog.pg_constraint's confkey column from including columns that were not involved in the foreign key reference. 31689: storage: pick up fix for Raft uncommitted entry size tracking r=benesch a=tschottdorf Waiting for the upstream PR etcd-io/etcd#10199 to merge, but this is going to be what the result will look like. ---- The tracking of the uncommitted portion of the log had a bug where it wasn't releasing everything as it should've. As a result, over time, all proposals would be dropped. We're hitting this way earlier in our import tests, which propose large proposals. As an intentional implementation detail, a proposal that itself exceeds the max uncommitted log size is allowed only if the uncommitted log is empty. Due to the leak, we weren't ever hitting this case and so AddSSTable commands were often dropped indefinitely. Fixes #31184. Fixes #28693. Fixes #31642. Optimistically: Fixes #31675. Fixes #31654. Fixes #31446. Release note: None Co-authored-by: Solon Gordon <solon@cockroachlabs.com> Co-authored-by: Bram Gruneir <bram@cockroachlabs.com> Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Build succeeded |
31705: exec: get rid of Bools and Bytes; add some testing utils r=jordanlewis a=jordanlewis opTestInput and opTestOutput allow testing operators without having to construct ColBatches by hand, with a natural syntax inspired by ProcessorTestCase. Also, get rid of `Bools` in favor of `[]bool`, and `Bytes` in favor of `[][]byte`, as it's up to 4x slower at least on the Distinct benchmark. Split from #31693. Based on #31554. Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
31693: exec: implement int64 sorted distinct r=jordanlewis a=jordanlewis This operator runs sorted distinct over n columns, populating the selection vector with the indices of the tuples that are distinct in the input. This will be execgen templated later. ``` BenchmarkSortedDistinct-8 500000 3924 ns/op 6262.34 MB/s ``` Few other changes here: - use []bool, not Bools, for boolean columns. Turns out the Bools idea was really slow - 3-4x worse on the benchmark. - added a testing util library that uses a little Go reflection to make writing testcases easier. - added the boolean zero operator All based on top of #31554. Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.
Release note: None