Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-39839][SQL] Handle special case of null variable-length Decima…
…l with non-zero offsetAndSize in UnsafeRow structural integrity check ### What changes were proposed in this pull request? Update the `UnsafeRow` structural integrity check in `UnsafeRowUtils.validateStructuralIntegrity` to handle a special case with null variable-length DecimalType value. ### Why are the changes needed? The check should follow the format that `UnsafeRowWriter` produces. In general, `UnsafeRowWriter` clears out a field with zero when the field is set to be null, c.f. `UnsafeRowWriter.setNullAt(ordinal)` and `UnsafeRow.setNullAt(ordinal)`. But there's a special case for `DecimalType` values: this is the only type that is both: - can be fixed-length or variable-length, depending on the precision, and - is mutable in `UnsafeRow`. To support a variable-length `DecimalType` to be mutable in `UnsafeRow`, the `UnsafeRowWriter` always leaves a 16-byte space in the variable-length section of the `UnsafeRow` (tail end of the row), regardless of whether the `Decimal` value being written is null or not. In the fixed-length part of the field, it would be an "OffsetAndSize", and the `offset` part always points to the start offset of the variable-length part of the field, while the `size` part will either be `0` for the null value, or `1` to at most `16` for non-null values. When `setNullAt(ordinal)` is called instead of passing a null value to `write(int, Decimal, int, int)`, however, the `offset` part gets zero'd out and this field stops being mutable. There's a comment on `UnsafeRow.setDecimal` that mentions to keep this field able to support updates, `setNullAt(ordinal)` cannot be called, but there's no code enforcement of that. So we need to recognize that in the structural integrity check and allow variable-length `DecimalType` to have non-zero field even for null. Note that for non-null values, the existing check does conform to the format from `UnsafeRowWriter`. It's only null value of variable-length `DecimalType` that'd trigger a bug, which can affect Structured Streaming's checkpoint file read where this check is applied. ### Does this PR introduce _any_ user-facing change? Yes, previously the `UnsafeRow` structural integrity validation will return false positive for correct data, when there's a null value in a variable-length `DecimalType` field. The fix will no longer return false positive. Because the Structured Streaming checkpoint file validation uses this check, previously a good checkpoint file may be rejected by the check, and the only workaround is to disable the check; with the fix, the correct checkpoint file will be allowed to load. ### How was this patch tested? Added new test case in `UnsafeRowUtilsSuite` Closes #37252 from rednaxelafx/fix-unsaferow-validation. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
- Loading branch information