sql: extract k/v operations from insert/update/delete #6211

danhhz · 2016-04-21T20:46:01Z

Separates the mapping between table rows/indexes and key/values for the
mutative operations. This will be useful immediately for implementing UPSERT and
eventually for TableWriter.

I tried to keep this as pure a refactor as I could, but it was a lot simpler
to model an UPDATE with a PK change as a delete followed by an update (instead
of migrating the mess I committed before).

This change is

danhhz · 2016-04-21T20:46:23Z

cc @knz @RaduBerinde

knz · 2016-04-21T21:40:08Z

I like this refactoring very much! Agreed on modeling update on PK change as delete+insert.

Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

sql/delete.go, line 161 [r1] (raw file):
I'd rather see a more verbose method name here. eg. HasFastPath / FastPathAvailable

Comments from Reviewable

RaduBerinde · 2016-04-22T13:49:33Z

Awesome, this is great! I have various comments and I want to make another pass in more detail but the overall structure looks great.

I definitely agree on modeling a PK UPDATE as a delete + insert!

Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.

sql/delete.go, line 161 [r1] (raw file):
+1, except it should be unexported, fastPathAvailable

sql/rowwriter.go, line 32 [r1] (raw file):
I think this name is a misnomer. rowWriter doesn't actually write anything, it just helps with creating keys etc. I would name it maybe rowEncoder or rowWriteHelper.

sql/rowwriter.go, line 35 [r1] (raw file):
Which columns are in this map? Only the columns we are providing a value for? Including default values? This should be mentioned in a field comment, and in a comment for makeRowWriter. I am assuming this doesn't contain all the columns in the table, because then I don't see why the require.. functions below are useful.

sql/rowwriter.go, line 94 [r1] (raw file):
why is this exported? would a better name be columnInPK? Maybe add a TODO to move this functionality to TableDescriptor (I filed #6233, you can reference that)

sql/rowwriter.go, line 96 [r1] (raw file):
This doesn't work as you expect with a non-pointer receiver.. The "new" rw is gone when the function ends

sql/rowwriter.go, line 107 [r1] (raw file):
I am not convinced composing makes the most sense here. rowWriter just provides some helper functionality, I would make it a regular field.

sql/rowwriter.go, line 114 [r1] (raw file):
needs a description of what the map contains (in the context of an insert, and w.r.t default values)

sql/rowwriter.go, line 142 [r1] (raw file):
insertRow. The comment should be more accurate about what it's actually doing // insertRow adds to the batch the CPut operations necessary to insert a table row with the given values

sql/rowwriter.go, line 224 [r1] (raw file):
same - map needs a description in the context of update, and the relation between the columns in the map and updateCols

sql/rowwriter.go, line 227 [r1] (raw file):
why are we passing full column descriptors instead of just column IDs (or index in TableDesc.Columns)?

sql/rowwriter.go, line 288 [r1] (raw file):
Here we are making three rowWriters.. Might not matter but seems a bit wasteful. At the very least we could stop making rw and just using rd or ri for that functionality (maybe make rw a pointer to rd.rowWriter). Or do we expect these rowWriters to be different from each other (I haven't looked closely if they handle mutations differently)?

sql/rowwriter.go, line 319 [r1] (raw file):
updateRow. Same as before, the comment should say that we are adding ops to the batch, not actually performing the update.

sql/rowwriter.go, line 437 [r1] (raw file):
what does the map contain? all the columns? If yes why do we take it as an argument instead of computing it in here?

sql/rowwriter.go, line 452 [r1] (raw file):
all these functions should be unexported.

Also, I just noticed the receiver, all these should take a pointer receiver or we may be making full copies of these structs every time! Applies to rowUpdater and rowInserter as well.

Comments from Reviewable

danhhz · 2016-04-22T18:38:46Z

Thanks for the review! RFAL

Review status: 0 of 4 files reviewed at latest revision, 14 unresolved discussions, some commit checks pending.

sql/delete.go, line 161 [r1] (raw file):
Done.

sql/rowwriter.go, line 32 [r1] (raw file):
Done.

sql/rowwriter.go, line 35 [r1] (raw file):
Ugh. It's really complicated, depends on which operation(s) you're doing, and probably needs a cleanup. I added docs all over the place, but I'm going to clean it up more once I see how upsert affects everything.

sql/rowwriter.go, line 94 [r1] (raw file):
Done.

sql/rowwriter.go, line 96 [r1] (raw file):
Oops! Good catch.

sql/rowwriter.go, line 107 [r1] (raw file):
Fair, I was 50/50 myself.

sql/rowwriter.go, line 114 [r1] (raw file):
Done.

sql/rowwriter.go, line 142 [r1] (raw file):
Done.

sql/rowwriter.go, line 224 [r1] (raw file):
Done.

sql/rowwriter.go, line 227 [r1] (raw file):
That's what was happening before. Trying to keep the refactor minimal, I guess.

sql/rowwriter.go, line 288 [r1] (raw file):
The indexes given to each are different. I want to look into unifying them, but I'm worried that upsert might invalidate anything I do now. Added a TODO.

sql/rowwriter.go, line 319 [r1] (raw file):
Done.

sql/rowwriter.go, line 437 [r1] (raw file):
Added a comment.

I realize I should have been way more clear about the role I saw of this commit. It's mostly setting up the structure and trying to change as little as possible otherwise.

REPLACE is going to build on this, and I'm hoping it'll help me see some parts of this that are still not clear. Which component is responsible for deciding which columns have to be provided (and sharing colIDtoRowIndex) is part of that.

sql/rowwriter.go, line 452 [r1] (raw file):
Done.

(Argh, I was trying to reduce allocations. Copying the struct shouldn't be that expensive, but I forgot it would do that for every row while the alloc only happens once per statement.)

Comments from Reviewable

RaduBerinde · 2016-04-22T19:57:22Z

Reviewed 4 of 4 files at r1, 4 of 4 files at r2.
Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.

sql/rowwriter.go, line 288 [r1] (raw file):
Ah, then it's fine, I didn't have a clear understanding of how the column map differs between them.
One thing that we could do is avoid initializing rd and ri if primaryKeyColChange is false (right?)

sql/rowwriter.go, line 437 [r1] (raw file):
Ah, got it. You can ignore some of the comments I made so far about the columns passed to the "constructors"

sql/rowwriter.go, line 452 [r1] (raw file):
Reducing allocations is a bit perpendicular to this. A pointer receiver just means that the function can modify the struct in place. It doesn't imply that you have to call it through a pointer, and it doesn't even necessarily imply that the struct has to be on the heap (though it can cause that to happen). In our case it would have been on the heap anyway because it's all part of some bigger struct that is we allocate (like insertNode).

Anyway, I doubt it would matter.rowHelper already does some internal allocations anyway (for the slices, maps etc) and (as you pointed out) it's a once-per-statement kind of thing.

sql/rowwriter.go, line 55 [r2] (raw file):
It's fine either way bit I wanted to point out that the make* functions could have stayed as they were. There is no requirement that just because a struct has pointer-receiver methods that it has to be addressed through a pointer. It can be a non-pointer field in a struct.

sql/rowwriter.go, line 122 [r2] (raw file):
can be a non-pointer rowHelper if you are not sharing pointers across structs

sql/rowwriter.go, line 129 [r2] (raw file):
newRowInserter

sql/rowwriter.go, line 133 [r2] (raw file):
do we require all non-nullable columns to be part of cols? (i'm assuming yes)

sql/rowwriter.go, line 251 [r2] (raw file):
mentioning the 1:1 relationship between the map and the values is fine but we're still not specifying what columns these are expected to be.. I assume it has to be the union of:

all the columns we are actually changing
all PK columns
all columns that are part of an index
This needs to be specified somewhere..

Comments from Reviewable

Separates the mapping between table rows/indexes and key/values for the mutative operations. This will be useful immediately for implementing UPSERT and eventually for TableWriter. I tried to keep this as pure a refactor as I could, but it was a lot simpler to model an UPDATE with a PK change as a delete followed by an update (instead of migrating the mess I committed before). I'm still not clear which component decides which columns are required, but I think the additionally complexity that UPSERT/REPLACE adds will make it more obvious one way or another. Once that happens it should be more obvious how to handle all these colIDtoRowIndex maps.

danhhz · 2016-04-25T19:06:28Z

Review status: 0 of 4 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.

sql/rowwriter.go, line 288 [r1] (raw file):
Good call. Done.

sql/rowwriter.go, line 55 [r2] (raw file):
Got it, I'm still figuring out when I should use which. Just asked David and Nathan and it got the impression that it's more idiomatic for me to switch these back, so I will.

sql/rowwriter.go, line 122 [r2] (raw file):
Done.

sql/rowwriter.go, line 129 [r2] (raw file):
Switched back to make

sql/rowwriter.go, line 133 [r2] (raw file):
There's a separate check in Insert. I could be convinced otherwise, but I think this layer doesn't care about nullability

sql/rowwriter.go, line 251 [r2] (raw file):
Okay, added that information.

This code review has convinced me that the colIDtoRowIndex map is a bit of a maintenance nightmare. I'm going to make a pointed effort to fix it up somehow in the next commit

Comments from Reviewable

danhhz force-pushed the rowwriter branch from 16de381 to 6c453c9 Compare April 22, 2016 18:38

petermattis mentioned this pull request Apr 22, 2016

sql: improve client.Batch creation on INSERT/DELETE/UPDATE #6248

Closed

danhhz force-pushed the rowwriter branch from 6c453c9 to cf5b34c Compare April 22, 2016 18:44

danhhz force-pushed the rowwriter branch 3 times, most recently from 8f43eff to 6a75969 Compare April 25, 2016 18:15

danhhz force-pushed the rowwriter branch from 6a75969 to 9115966 Compare April 25, 2016 18:46

danhhz merged commit 5f6e9e1 into cockroachdb:master Apr 25, 2016

danhhz deleted the rowwriter branch April 25, 2016 19:06

danhhz mentioned this pull request Apr 25, 2016

sql: any plan of implementing 'UPSERT' or 'ON DUPLICATE KEY UPDATE' ? #1962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: extract k/v operations from insert/update/delete #6211

sql: extract k/v operations from insert/update/delete #6211

danhhz commented Apr 21, 2016

danhhz commented Apr 21, 2016

knz commented Apr 21, 2016

RaduBerinde commented Apr 22, 2016

danhhz commented Apr 22, 2016

RaduBerinde commented Apr 22, 2016

danhhz commented Apr 25, 2016

sql: extract k/v operations from insert/update/delete #6211

sql: extract k/v operations from insert/update/delete #6211

Conversation

danhhz commented Apr 21, 2016

danhhz commented Apr 21, 2016

knz commented Apr 21, 2016

RaduBerinde commented Apr 22, 2016

danhhz commented Apr 22, 2016

RaduBerinde commented Apr 22, 2016

danhhz commented Apr 25, 2016