Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Audit BatchScanExec #234

Closed
kuhushukla opened this issue Jun 19, 2020 · 5 comments
Closed

[FEA] Audit BatchScanExec #234

kuhushukla opened this issue Jun 19, 2020 · 5 comments
Assignees
Labels
feature request New feature or request

Comments

@kuhushukla
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

  • Ensure Spark and Rapids plugin version of the exec match functionality.

  • Verify Config specific to the operator match.

  • Verify API is consistent and fully translated.

  • Port relevant tests.

@kuhushukla kuhushukla added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jun 19, 2020
@kuhushukla kuhushukla self-assigned this Jul 2, 2020
@kuhushukla kuhushukla removed the ? - Needs Triage Need team to review and classify label Jul 2, 2020
@kuhushukla kuhushukla added this to the Jul 6 - Jul 17 milestone Jul 2, 2020
@kuhushukla
Copy link
Collaborator Author

Some of the confs may need to be tested as Spark does:

 hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, classOf[ParquetReadSupport].getName)
    hadoopConf.set(
      ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA,
      readDataSchemaAsJson)
    hadoopConf.set(
      ParquetWriteSupport.SPARK_ROW_SCHEMA,
      readDataSchemaAsJson)
    hadoopConf.set(
      SQLConf.SESSION_LOCAL_TIMEZONE.key,
      sparkSession.sessionState.conf.sessionLocalTimeZone)
    hadoopConf.setBoolean(
      SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key,
      sparkSession.sessionState.conf.nestedSchemaPruningEnabled)
...
hadoopConf.setBoolean(
      SQLConf.PARQUET_BINARY_AS_STRING.key,
      sparkSession.sessionState.conf.isParquetBinaryAsString)

@kuhushukla
Copy link
Collaborator Author

OrcScan seems all good.

@kuhushukla
Copy link
Collaborator Author

Csv scan we need tests for column pruning.

@kuhushukla
Copy link
Collaborator Author

Will add follow on issues here shortly.

@kuhushukla
Copy link
Collaborator Author

kuhushukla commented Jul 24, 2020

See #425

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant