Skip to content

Add CAST expression support to parser#3207

Open
nssalian wants to merge 3 commits into
apache:mainfrom
nssalian:cast-expression-support
Open

Add CAST expression support to parser#3207
nssalian wants to merge 3 commits into
apache:mainfrom
nssalian:cast-expression-support

Conversation

@nssalian
Copy link
Copy Markdown
Contributor

Closes #198

Picks up where #209 left off with @Fokko's feedback addressed. Thanks @jayceslesar for the original work.

Rationale for this change

Adds CAST(column AS type) parsing that maps to Iceberg transforms (dateDayTransform, yearYearTransform, monthMonthTransform, hourHourTransform). Also implements BoundTransform.ref() and eval() — both were missing abstract methods required by BoundTerm. Introduces UnboundTransform with bind() that validates via can_transform().

Are these changes tested?

Yes. 11 new tests covering all transform types, case insensitivity, nested fields, unsupported types, comparison operators, and boolean composition.

Are there any user-facing changes?

No.

@nssalian nssalian marked this pull request as ready for review March 30, 2026 15:46
@nssalian
Copy link
Copy Markdown
Contributor Author

@Fokko @geruh @kevinjqliu PTAL

Comment thread pyiceberg/expressions/parser.py
Comment on lines +285 to +288
expected = EqualTo(
UnboundTransform(Reference("created_at"), DayTransform()),
StringLiteral("2024-01-01"),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, I think it would be good to have an integration test as well. The StringLiteral should be converted to a DateLiteral when bound to the transform.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a unit test for this.

@nssalian nssalian requested a review from Fokko June 2, 2026 04:41
@rambleraptor
Copy link
Copy Markdown
Contributor

rambleraptor commented Jun 3, 2026

I've been playing around with this PR and I ran into something.

Here's a quick demo script for your own testing. One row, 1/1/24, we'd expect to be able to query it in this script.

schema = Schema(NestedField(1, "ts", TimestampType(), required=False))

sql = "CAST(ts AS year) > 2020"
row = Record(datetime_to_micros(datetime(2024, 1, 1)))
print(expression_evaluator(schema, parser.parse(sql), case_sensitive=True)(row)) # False

It looks like we're running into this issue because we coerce ts to a TimestampType at comparison time. That means the right side is TimestampLiteral(2020) -> which isn't the year 2020.

assert expected == parser.parse("CAST(created_at AS date) = '2024-01-01'")


def test_cast_date_binds_string_literal_to_date() -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have more tests like this with uneven types. I posted an example in another comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CAST to parser.py

3 participants