Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Loading columns from an ORC file without column names returns no data #962

Closed
razajafri opened this issue Oct 15, 2020 · 2 comments · Fixed by #1240
Closed

[BUG] Loading columns from an ORC file without column names returns no data #962

razajafri opened this issue Oct 15, 2020 · 2 comments · Fixed by #1240
Assignees
Labels
audit_3.0.1 Audit related tasks for 3.0.1 bug Something isn't working P0 Must have for release

Comments

@razajafri
Copy link
Collaborator

razajafri commented Oct 15, 2020

commit b745041f698120be21ab889706880e976a599fdb
Author: SaurabhChawla saurabhc@qubole.com
Date: Thu Jul 16 13:11:47 2020 +0000

[SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
@razajafri razajafri added feature request New feature or request ? - Needs Triage Need team to review and classify audit_3.0.1 Audit related tasks for 3.0.1 labels Oct 15, 2020
@razajafri razajafri changed the title [FEA] We probably need to make this same change in the GpuOrcFileFormat [FEA] Audit_3.0.1: We probably need to make this same change in the GpuOrcFileFormat Oct 15, 2020
@razajafri razajafri changed the title [FEA] Audit_3.0.1: We probably need to make this same change in the GpuOrcFileFormat [FEA] Audit_3.0.1: GpuOrcFileFormat might need to be changed as a part of this change Oct 15, 2020
@sameerz sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Oct 20, 2020
@kuhushukla
Copy link
Collaborator

@razajafri Can u elaborate on what the fix would be and change the priority if needed?

@razajafri
Copy link
Collaborator Author

     val table = """CREATE TABLE `test_orc_data` (
      `_col1` INT,
      `_col2` STRING,
      `_col3` INT)
      USING orc"""
    
    spark.sql(table).collect
    
    spark.sql("insert into test_orc_data values(13, '155', 2020)").collect
    
    val df = """select _col2 from test_orc_data limit 5"""
    spark.sql(df).collect

If we run the above example in 3.0.1 with rapids-plugin turned off, it results in the following

res7: Array[org.apache.spark.sql.Row] = Array([155])

With the rapids-plugin turned on it results in

20/11/11 00:25:05 WARN SchemaEvolution: Column names are missing from this file. This is caused by a writer earlier than HIVE-4243. The reader will reconcile schemas based on index. File type: struct<_col1:int,_col2:string,_col3:int>, reader type: struct<_col2:string>
res3: Array[org.apache.spark.sql.Row] = Array([])

It seems like the GPU version doesn't crash but reports an incorrect results because the column names aren't passed

@revans2 revans2 changed the title [FEA] Audit_3.0.1: GpuOrcFileFormat might need to be changed as a part of this change [BUG] Audit_3.0.1: GpuOrcFileFormat might need to be changed as a part of this change Nov 11, 2020
@revans2 revans2 added bug Something isn't working and removed feature request New feature or request labels Nov 11, 2020
@razajafri razajafri self-assigned this Nov 17, 2020
@razajafri razajafri added this to the Nov 23 - Dec 4 milestone Nov 17, 2020
@sameerz sameerz assigned jlowe and unassigned razajafri Nov 20, 2020
@jlowe jlowe changed the title [BUG] Audit_3.0.1: GpuOrcFileFormat might need to be changed as a part of this change [BUG] Loading partial columns from an ORC file without column names returns no data Dec 2, 2020
@jlowe jlowe changed the title [BUG] Loading partial columns from an ORC file without column names returns no data [BUG] Loading columns from an ORC file without column names returns no data Dec 2, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…IDIA#962)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit_3.0.1 Audit related tasks for 3.0.1 bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants