Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: initial commit of execgen tool #31554

Merged
merged 1 commit into from
Oct 22, 2018

Conversation

solongordon
Copy link
Contributor

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

@solongordon solongordon requested review from jordanlewis and a team October 17, 2018 19:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/exec/.gitignore, line 1 at r1 (raw file):

rowstovec.og.go

I'd just make this *.og.go so we don't have to keep editing it


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):

// EncDatumRowsToColVec converts one column from EncDatumRows to a column
// vector.

Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):

	nRows := uint16(len(rows))
	{{range .}}
	if columnType.SemanticType == sqlbase.{{.SemanticType}} && columnType.Width == {{.Width}} {

I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.

I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.

Finally, I think the code will be more efficient with a switch on SemanticType the way it's currently implemented in columnarizer.go. That's because switch does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.

tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 56 at r1 (raw file):

				vec.SetNull(i)
			} else {
				{{if .HasSetMethod}}

Nice - this is a lot less gross than what I was fearing, now that I see it written down. I suppose a similar approach could probably go well for the infix operators too - just a simple if.


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):

}

var columnConversions = []columnConversion{

I would consider making this slice generated from a map in an init function.

The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name) that gives you back "BOOL" for ColumnType_Bool. For the ExecType, I would consider using FromColumnType on the key of the map to get the actual value of the execType, and then adding a new map to types.go similar to ColumnType_SemanticType_name that gives you back the actual variable name.

For HasSetMethod, you can use the reflect package. Do newMemColumn on the type, then reflect.TypeOf the inner column of the type, then ask it if it reflect.Implements a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }).

For DatumToPhysicalFn I'm sure there's something you can do too.

Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.

@jordanlewis
Copy link
Member

I have a few thoughts that I'm jotting down here, don't take these comments as work to be done on the current PR though.

Instead of hardcoding the files that will be generated in main.go, like optgen does, we probably want to have that be autogenerated as well, since we'll have a long list of generated code.

I'm envisioning a registerGenerator function that each operator will call in its init function that adds a generator to a list of things that execgen can generate. That will minimize the number of things that has to change each time a new operator template is added - and I think there will be lots of these templates over time.

Also, there are some common mappings that we'll need over and over again in templates, like the name of the vector accessor for each type (currently always exactly equal to the type enum name, I guess), the semantic type, and the Go type associated with the exec type (e.g. int32 for exec.Int32). I'm hoping the columnConversions slice or something like it can grow over time into a generic all-types slice that every operator can iterate over in the same way to generate all of its variants. Not sure if it'll work out in practice, though.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/exec/.gitignore, line 1 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'd just make this *.og.go so we don't have to keep editing it

Done.


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 34 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Can you add a comment explaining the columnIdx parameter? Seems like something that could end up being confusing for newcomers one day (all those index-to-index maps, for example).

Done.


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 44 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I'm a little suspicious about this, because I don't think Width has semantic meaning for most types - just INT, DECIMAL, CHAR, and BINARY according to the comment in structured.proto.

I suppose if they're just left zeroed on both sides (in the columnConversion as well as the ColumnType) this will just be a no-op... but since we're in template land, I feel like we might as well just remove the width equality methods since we do have the chance to do so.

Finally, I think the code will be more efficient with a switch on SemanticType the way it's currently implemented in columnarizer.go. That's because switch does a binary search to find the correct case statement. We could even use a lookup table directly, since ColumnType_SemanticType (though an int32 for some reason) doesn't have many values. Then again, this is just once per batch, so maybe who cares... I'd definitely be motivated to fix this myself, but I guess we've got more to do for a prototype than to shave every last batch-level microsecond.

tl;dr this whole comment is optional for now but would definitely yield less work for the CPU to do

Yes, I meant to leave a TODO here. I'm doing the simple thing for now and punting on making it more efficient until we handle more types. I'll add a comment.


pkg/sql/exec/execgen/cmd/execgen/rowstovec_gen.go, line 88 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I would consider making this slice generated from a map in an init function.

The key of the map can be the actual semantic type value - e.g. ColumnType_BOOL. Protobuf provides an autogenerated map of value to name (ColumnType_SemanticType_name) that gives you back "BOOL" for ColumnType_Bool. For the ExecType, I would consider using FromColumnType on the key of the map to get the actual value of the execType, and then adding a new map to types.go similar to ColumnType_SemanticType_name that gives you back the actual variable name.

For HasSetMethod, you can use the reflect package. Do newMemColumn on the type, then reflect.TypeOf the inner column of the type, then ask it if it reflect.Implements a marker interface that you can give to all of those vector types (type HasSet interface { HasSetMarker() }).

For DatumToPhysicalFn I'm sure there's something you can do too.

Perhaps the last two are too high effort and tricky to get right, but I don't think there's a strong reason not to do it for the SemanticType and ExecType names, and I think we'll likely reuse that work in other templates.

Done. I'm saving the last two for late. (I fiddled with suing reflection for HasSetMethod but it wasn't feeling worth the time at this point. And I think it would require exporting all the MemColumn stuff.)

Also I put this logic in the generator function rather than an init() for now since no one else needs the columnConversions so far.

Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


Makefile, line 1315 at r2 (raw file):

pkg/sql/exec/rowstovec.og.go: bin/execgen
	execgen -out $@ rowstovec

nit: if you go through CI again, move this above the optgen variable declarations

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added the registerGenerator functionality. Also changed it to pick the generator based on the specified filename, which reduces how much we have to touch in both execgen/main.go and Makefile when we add new generators.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None
@solongordon
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Oct 22, 2018
31554: exec: initial commit of execgen tool r=solongordon a=solongordon

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

31610: sql: fix pg_catalog.pg_constraint's confkey column r=BramGruneir a=BramGruneir

Prior to this patch, all columns in the index were included instead of only the
ones being used in the foreign key reference.

Fixes #31545.

Release note (bug fix): Fix pg_catalog.pg_constraint's confkey column from
including columns that were not involved in the foreign key reference.

31689: storage: pick up fix for Raft uncommitted entry size tracking r=benesch a=tschottdorf

Waiting for the upstream PR

etcd-io/etcd#10199

to merge, but this is going to be what the result will look like.

----

The tracking of the uncommitted portion of the log had a bug where
it wasn't releasing everything as it should've. As a result, over
time, all proposals would be dropped. We're hitting this way earlier
in our import tests, which propose large proposals. As an intentional
implementation detail, a proposal that itself exceeds the max
uncommitted log size is allowed only if the uncommitted log is empty.
Due to the leak, we weren't ever hitting this case and so AddSSTable
commands were often dropped indefinitely.

Fixes #31184.
Fixes #28693.
Fixes #31642.

Optimistically:
Fixes #31675.
Fixes #31654.
Fixes #31446.

Release note: None

Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Contributor

craig bot commented Oct 22, 2018

Build succeeded

@craig craig bot merged commit c106c32 into cockroachdb:master Oct 22, 2018
craig bot pushed a commit that referenced this pull request Oct 23, 2018
31705: exec: get rid of Bools and Bytes; add some testing utils r=jordanlewis a=jordanlewis

opTestInput and opTestOutput allow testing operators without having to
construct ColBatches by hand, with a natural syntax inspired by
ProcessorTestCase.

Also, get rid of `Bools` in favor of `[]bool`, and `Bytes` in favor of `[][]byte`, as it's up to 4x slower at least on the Distinct benchmark.

Split from #31693.
Based on #31554.

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@solongordon solongordon deleted the initial-execgen branch October 23, 2018 12:00
craig bot pushed a commit that referenced this pull request Oct 24, 2018
31693: exec: implement int64 sorted distinct r=jordanlewis a=jordanlewis

This operator runs sorted distinct over n columns, populating the
selection vector with the indices of the tuples that are distinct in the
input.

This will be execgen templated later.

```
BenchmarkSortedDistinct-8         500000              3924 ns/op        6262.34 MB/s
```

Few other changes here:

- use []bool, not Bools, for boolean columns. Turns out the Bools idea was really slow - 3-4x worse on the benchmark.
- added a testing util library that uses a little Go reflection to make writing testcases easier.
- added the boolean zero operator

All based on top of #31554.

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants