[SPARK-55559][SQL] Fix BIT_COUNT for negative tinyint/smallint/int#54819
Open
azmatsiddique wants to merge 4 commits intoapache:masterfrom
Open
[SPARK-55559][SQL] Fix BIT_COUNT for negative tinyint/smallint/int#54819azmatsiddique wants to merge 4 commits intoapache:masterfrom
azmatsiddique wants to merge 4 commits intoapache:masterfrom
Conversation
…n last CSV column ### What changes were proposed in this pull request? This PR fixes an issue where the CSV reader inconsistently parses empty quoted strings (`""`) when the `escape` option is set to an empty string (`""`). Previously, mid-line empty quoted strings correctly resolved to null/empty, but the last column resolved to a literal `"` character due to univocity parser behavior. ### Why are the changes needed? To ensure consistent parsing of CSV data regardless of column position. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a bug where users were receiving incorrect data (a literal quote instead of an empty/null value) for the last column in a row under specific CSV configurations. ### How was this patch tested? Added a new regression test in `CSVSuite` that verifies consistent parsing of both mid-line and end-of-line empty quoted fields.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR fixes a bug in the
BIT_COUNTexpression where negative values of typestinyint(ByteType),smallint(ShortType), andint(IntegerType) were returning incorrect results.The evaluation logic in
BitwiseCount(nullSafeEvalanddoGenCode) was updated to usejava.lang.Integer.bitCount()with appropriate bit-masking (& 0xFFfor bytes,& 0xFFFFfor shorts) instead of casting all smaller integer types to 64-bitLongs.Why are the changes needed?
This is a bug fix. Previously, when a negative byte, short, or int was passed to
BIT_COUNT, it was implicitly cast to a 64-bitLongfor the evaluation (java.lang.Long.bitCount()). This caused the negative values to be sign-extended to 64 bits (e.g., an 8-bit-1became a 64-bit-1), leading to an incorrect bit count of64instead of the expected8for atinyint.Does this PR introduce any user-facing change?
Yes, it fixes incorrect query results for
BIT_COUNTwhen passing negative inputs of typestinyint,smallint, andint.Previous Behavior:
SELECT bit_count(cast(-1 as tinyint))->64New Behavior:
SELECT bit_count(cast(-1 as tinyint))->8How was this patch tested?
Added new test cases for negative 8-bit, 16-bit, and 32-bit boundary values in BitwiseExpressionsSuite.scala, and verified that they fail on the old code and pass with the new fix.
Tested successfully locally by running:
build/sbt "sql/testOnly org.apache.spark.sql.catalyst.expressions.BitwiseExpressionsSuite"build/sbt "sql/testOnly org.apache.spark.sql.MathFunctionsSuite"Was this patch authored or co-authored using generative AI tooling?
No