SPARK-3807: SparkSql does not work for tables created using custom serde #2674

chiragaggarwal · 2014-10-06T11:20:49Z

SparkSql crashes on selecting tables using custom serde.

Example:

CREATE EXTERNAL TABLE table_name PARTITIONED BY ( a int) ROW FORMAT 'SERDE "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer" with serdeproperties("serialization.format"="org.apache.thrift.protocol.TBinaryProtocol","serialization.class"="ser_class") STORED AS SEQUENCEFILE;

The following exception is seen on running a query like 'select * from table_name limit 1':

ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException
at org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68)
at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80)
at org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86)
at org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:100)
at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException

AmplabJenkins · 2014-10-06T11:22:09Z

Can one of the admins verify this patch?

marmbrus · 2014-10-09T19:50:09Z

ok to test

SparkQA · 2014-10-09T19:55:21Z

QA tests have started for PR 2674 at commit ba4bc0c.

This patch merges cleanly.

AmplabJenkins · 2014-10-09T20:32:34Z

Can one of the admins verify this patch?

SparkQA · 2014-10-09T21:15:03Z

QA tests have finished for PR 2674 at commit ba4bc0c.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-09T21:15:06Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21546/Test PASSed.

marmbrus · 2014-10-10T00:57:45Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala

@@ -80,10 +80,15 @@ case class HiveTableScan(
    ColumnProjectionUtils.appendReadColumnIDs(hiveConf, neededColumnIDs)
    ColumnProjectionUtils.appendReadColumnNames(hiveConf, attributes.map(_.name))

+    val td = relation.tableDesc
+    val deClass = td.getDeserializerClass;
+    val de = deClass.newInstance();


In general we try to use descriptive variable names instead of short one. Also we do not use ;s

val deserializer = relation.tableDesc.getDeserializerClass.newInstance()

marmbrus · 2014-10-10T00:58:19Z

Can you also add a unit test for this? Perhaps in HiveQuerySuite.

marmbrus · 2014-10-10T00:58:32Z

/cc @yhuai

chenghao-intel · 2014-10-10T01:13:58Z

This is a good catch!, +1. It would be great if unit test added.

yhuai · 2014-10-10T01:57:04Z

LGTM

…rde (Incorporated Review Comments)

SparkQA · 2014-10-10T05:09:51Z

QA tests have started for PR 2674 at commit 1f26805.

This patch merges cleanly.

SparkQA · 2014-10-10T06:32:03Z

QA tests have finished for PR 2674 at commit 1f26805.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-10T06:32:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21573/Test PASSed.

chiragaggarwal · 2014-10-10T12:14:34Z

Incorporated the review comments and also added a test case

SparkQA · 2014-10-10T12:19:52Z

QA tests have started for PR 2674 at commit 370c31b.

This patch merges cleanly.

SparkQA · 2014-10-10T13:42:19Z

QA tests have finished for PR 2674 at commit 370c31b.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-10T13:42:23Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21590/Test PASSed.

marmbrus · 2014-10-13T20:48:42Z

Thanks! I've merged to master and 1.1.

In the future please make all PRs against master, and we will back port as needed.

SparkSql crashes on selecting tables using custom serde. Example: ---------------- CREATE EXTERNAL TABLE table_name PARTITIONED BY ( a int) ROW FORMAT 'SERDE "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer" with serdeproperties("serialization.format"="org.apache.thrift.protocol.TBinaryProtocol","serialization.class"="ser_class") STORED AS SEQUENCEFILE; The following exception is seen on running a query like 'select * from table_name limit 1': ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68) at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80) at org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86) at org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:100) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NullPointerException Author: chirag <chirag.aggarwal@guavus.com> Closes #2674 from chiragaggarwal/branch-1.1 and squashes the following commits: 370c31b [chirag] SPARK-3807: Add a test case to validate the fix. 1f26805 [chirag] SPARK-3807: SparkSql does not work for tables created using custom serde (Incorporated Review Comments) ba4bc0c [chirag] SPARK-3807: SparkSql does not work for tables created using custom serde 5c73b72 [chirag] SPARK-3807: SparkSql does not work for tables created using custom serde

SPARK-3807: SparkSql does not work for tables created using custom serde

5c73b72

SPARK-3807: SparkSql does not work for tables created using custom serde

ba4bc0c

marmbrus reviewed Oct 10, 2014
View reviewed changes

chenghao-intel mentioned this pull request Oct 10, 2014

[SPARK-3816][SQL] Add table properties from storage handler to output jobConf #2677

Closed

SPARK-3807: SparkSql does not work for tables created using custom se…

1f26805

…rde (Incorporated Review Comments)

SPARK-3807: Add a test case to validate the fix.

370c31b

asfgit closed this in e6e3770 Oct 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-3807: SparkSql does not work for tables created using custom serde #2674

SPARK-3807: SparkSql does not work for tables created using custom serde #2674

chiragaggarwal commented Oct 6, 2014

AmplabJenkins commented Oct 6, 2014

marmbrus commented Oct 9, 2014

SparkQA commented Oct 9, 2014

AmplabJenkins commented Oct 9, 2014

SparkQA commented Oct 9, 2014

AmplabJenkins commented Oct 9, 2014

marmbrus Oct 10, 2014

marmbrus commented Oct 10, 2014

marmbrus commented Oct 10, 2014

chenghao-intel commented Oct 10, 2014

yhuai commented Oct 10, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

chiragaggarwal commented Oct 10, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

marmbrus commented Oct 13, 2014

SPARK-3807: SparkSql does not work for tables created using custom serde #2674

SPARK-3807: SparkSql does not work for tables created using custom serde #2674

Conversation

chiragaggarwal commented Oct 6, 2014

Example:

AmplabJenkins commented Oct 6, 2014

marmbrus commented Oct 9, 2014

SparkQA commented Oct 9, 2014

AmplabJenkins commented Oct 9, 2014

SparkQA commented Oct 9, 2014

AmplabJenkins commented Oct 9, 2014

marmbrus Oct 10, 2014

Choose a reason for hiding this comment

marmbrus commented Oct 10, 2014

marmbrus commented Oct 10, 2014

chenghao-intel commented Oct 10, 2014

yhuai commented Oct 10, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

chiragaggarwal commented Oct 10, 2014

SparkQA commented Oct 10, 2014

SparkQA commented Oct 10, 2014

AmplabJenkins commented Oct 10, 2014

marmbrus commented Oct 13, 2014