Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] sigsev while writing orc data #13238

Closed
abellina opened this issue Apr 27, 2023 · 4 comments · Fixed by #13240
Closed

[BUG] sigsev while writing orc data #13238

abellina opened this issue Apr 27, 2023 · 4 comments · Fixed by #13240
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@abellina
Copy link
Contributor

I am running a fairly large query (so no small repro yet) where while writing to orc I am seeing the stack trace below.

I got it to happen with a cuDF at: 5234278

The last cuDF I tried this with was at: 9ffd30f (and it works here).

So something changed between these two shas. Note also that I saw failures with parquet as well. I am going to try and bisect this some more.

Stack: [0x00007f83f40de000,0x00007f83f41df000],  sp=0x00007f83f41dc100,  free space=1016k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [cudf4731881092056514972.so+0x196bd3e]  std::__detail::_Map_base<std::pair<cudf::io::nvcomp::compression_type, cudf::io::nvcomp::feature_status_parameters>, std::pair<std::pair<cudf::io::nvcomp::compression_type, cudf::io::nvcomp::feature_status_parameters> const, std::optional<std::string> >, std::allocator<std::pair<std::pair<cudf::io::nvcomp::compression_type, cudf::io::nvcomp::feature_status_parameters> const, std::optional<std::string> > >, std::__detail::_Select1st, std::equal_to<std::pair<cudf::io::nvcomp::compression_type, cudf::io::nvcomp::feature_status_parameters> >, cudf::io::nvcomp::hash_feature_status_inputs, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](std::pair<cudf::io::nvcomp::compression_type, cudf::io::nvcomp::feature_status_parameters>&&)+0xbe

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  ai.rapids.cudf.Table.writeORCBufferBegin([Ljava/lang/String;I[I[Z[Ljava/lang/String;[Ljava/lang/String;I[I[ZLai/rapids/cudf/HostBufferConsumer;)J+0
j  ai.rapids.cudf.Table.access$1300([Ljava/lang/String;I[I[Z[Ljava/lang/String;[Ljava/lang/String;I[I[ZLai/rapids/cudf/HostBufferConsumer;)J+16
j  ai.rapids.cudf.Table$ORCTableWriter.<init>(Lai/rapids/cudf/ORCWriterOptions;Lai/rapids/cudf/HostBufferConsumer;)V+45
j  ai.rapids.cudf.Table$ORCTableWriter.<init>(Lai/rapids/cudf/ORCWriterOptions;Lai/rapids/cudf/HostBufferConsumer;Lai/rapids/cudf/Table$1;)V+3
j  ai.rapids.cudf.Table.writeORCChunked(Lai/rapids/cudf/ORCWriterOptions;Lai/rapids/cudf/HostBufferConsumer;)Lai/rapids/cudf/TableWriter;+7
j  org.apache.spark.sql.rapids.GpuOrcWriter.<init>(Ljava/lang/String;Lorg/apache/spark/sql/types/StructType;Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)V+67
@abellina abellina added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS labels Apr 27, 2023
@abellina
Copy link
Contributor Author

abellina commented Apr 27, 2023

Here's the culprit #13132. After reverting that change, the issue goes away.

The problem is that after #13132, we made is_compression_disabled and is_decompression_disabled not thread safe due to the map we are using to memoize the answers from an impl function (we are corrupting the stl container).

After reverting #13132 I no longer hit the issue.

@abellina
Copy link
Contributor Author

@vuule fyi making sure you are aware. Also thanks to @ttnghia for debugging with me.

@abellina
Copy link
Contributor Author

If this makes sense @vuule let me know if you want me to patch it. One option is to not memoize and the other option might be to lock access to the map.

@abellina abellina removed Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS labels Apr 27, 2023
@vuule
Copy link
Contributor

vuule commented Apr 27, 2023

@abellina Thank you for finding the root cause! Can you please verify if #13240 fixes the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants