[SPARK-55559][SQL] Fix BIT_COUNT for negative tinyint/smallint/int by azmatsiddique · Pull Request #54819 · apache/spark

azmatsiddique · 2026-03-16T06:49:08Z

What changes were proposed in this pull request?

This PR fixes a bug in the BIT_COUNT expression where negative values of types tinyint (ByteType), smallint (ShortType), and int (IntegerType) were returning incorrect results.

The evaluation logic in BitwiseCount (nullSafeEval and doGenCode) was updated to use java.lang.Integer.bitCount() with appropriate bit-masking (& 0xFF for bytes, & 0xFFFF for shorts) instead of casting all smaller integer types to 64-bit Longs.

Why are the changes needed?

This is a bug fix. Previously, when a negative byte, short, or int was passed to BIT_COUNT, it was implicitly cast to a 64-bit Long for the evaluation (java.lang.Long.bitCount()). This caused the negative values to be sign-extended to 64 bits (e.g., an 8-bit -1 became a 64-bit -1), leading to an incorrect bit count of 64 instead of the expected 8 for a tinyint.

Does this PR introduce any user-facing change?

Yes, it fixes incorrect query results for BIT_COUNT when passing negative inputs of types tinyint, smallint, and int.

Previous Behavior:
SELECT bit_count(cast(-1 as tinyint)) -> 64

New Behavior:
SELECT bit_count(cast(-1 as tinyint)) -> 8

How was this patch tested?

Added new test cases for negative 8-bit, 16-bit, and 32-bit boundary values in BitwiseExpressionsSuite.scala, and verified that they fail on the old code and pass with the new fix.

Tested successfully locally by running:
build/sbt "sql/testOnly org.apache.spark.sql.catalyst.expressions.BitwiseExpressionsSuite"
build/sbt "sql/testOnly org.apache.spark.sql.MathFunctionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No

…s corrupted file

…n last CSV column ### What changes were proposed in this pull request? This PR fixes an issue where the CSV reader inconsistently parses empty quoted strings (`""`) when the `escape` option is set to an empty string (`""`). Previously, mid-line empty quoted strings correctly resolved to null/empty, but the last column resolved to a literal `"` character due to univocity parser behavior. ### Why are the changes needed? To ensure consistent parsing of CSV data regardless of column position. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a bug where users were receiving incorrect data (a literal quote instead of an empty/null value) for the last column in a row under specific CSV configurations. ### How was this patch tested? Added a new regression test in `CSVSuite` that verifies consistent parsing of both mid-line and end-of-line empty quoted fields.

azmatsiddique added 4 commits March 14, 2026 22:48

[SPARK-55968][SQL] Do not treat vectorized reader capacity overflow a…

ad9573a

…s corrupted file

Trigger Github Actions

b127ca2

[SPARK-55559][SQL] Fix BIT_COUNT for negative tinyint/smallint/int

704b084

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55559][SQL] Fix BIT_COUNT for negative tinyint/smallint/int#54819

[SPARK-55559][SQL] Fix BIT_COUNT for negative tinyint/smallint/int#54819
azmatsiddique wants to merge 4 commits intoapache:masterfrom
azmatsiddique:SPARK-55559

azmatsiddique commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

azmatsiddique commented Mar 16, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant