Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct to string casting functionality #1814

Merged
merged 24 commits into from
Mar 30, 2021

Conversation

rwlee
Copy link
Collaborator

@rwlee rwlee commented Feb 25, 2021

Resolves #1698
Requires rapidsai/cudf#7406 -- draft until that PR is merged

Inherits incompatibility of base casting -- e.g. floats -- because casting is used recursively

@rwlee rwlee added feature request New feature or request cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Feb 25, 2021
@rwlee rwlee marked this pull request as ready for review March 10, 2021 10:49
@rwlee rwlee requested a review from revans2 March 15, 2021 22:25
integration_tests/src/main/python/struct_test.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/struct_test.py Outdated Show resolved Hide resolved
docs/supported_ops.md Outdated Show resolved Hide resolved
columns += ColumnVector.fromScalar(bracketScalar, input.getRowCount().toInt)
}
withResource(input.getBase().getChildColumnView(0)) { childView =>
withResource(childView.copyToColumnVector()) { childVector =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to copy this to a column vector? I understand because of the APIs we have right now it is needed, but it would be nice to avoid the copy if possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would not copy this column vector, which is why I had been using toString in earlier iterations. How might I avoid this copy?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of doColumnar takes a GpuColumnVector It uses the GpuColumnVector to extract its Spark data type so it can do the cast. A ColumnVector is a ColumnView. As such I would propose that we insert another API in between that we could use for recursive calls.

override def doColumnar(input: GpuColumnVector): ColumnVector =
    doColumnar(input.getBase, input.dataType())

private def doColumnar(input: ColumnView, sparkType: DataType): COlumnVector = {
   (sparkType, dataType) match {
      ...
   }
}

Then you can call the new doColumnar without making copy of the data.

withResource(input.getBase().getChildColumnView(0)) { childView =>
  columns += doColumnar(childView, inputSchema(0).dataType)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There a few things around decimal casting that didn't translate nicely. In some cases they want to do a zero-copy, so I've filtered them out in the GpuColumnVector wrapper API. In those cases, if those cases are found when doing a ColumnView cast, it's copied to a ColumnVector. Reworking the decimal support to be ColumnView friendly version can be done in a follow up PR if necessary, but it seemed a bit out of scope and was non-trivial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a follow on issue to clean this up? The issue appears to be because of incRefCount, and there should be ways for us to work around that without too much difficulty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@revans2
Copy link
Collaborator

revans2 commented Mar 29, 2021

build

@revans2 revans2 merged commit 200c72d into NVIDIA:branch-0.5 Mar 30, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Ryan Lee <ryanlee@nvidia.com>

Co-authored-by: Ryan Lee <ryanlee@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Ryan Lee <ryanlee@nvidia.com>

Co-authored-by: Ryan Lee <ryanlee@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support casting structs to string
2 participants