fix(exasol): treat LISTAGG second argument as the separator CLAUDE#7786
fix(exasol): treat LISTAGG second argument as the separator CLAUDE#7786gaoflow wants to merge 3 commits into
Conversation
Exasol routed both GROUP_CONCAT and LISTAGG through the MySQL-style _parse_group_concat, so LISTAGG(expr, sep) folded its second argument into the value via CONCAT instead of treating it as the separator. This produced a semantically wrong AST (separator duplicated into the value, non-idempotent on re-parse) that disagreed with every sibling dialect. Route LISTAGG through _parse_string_agg, matching Oracle, Snowflake, Trino and DuckDB. GROUP_CONCAT keeps _parse_group_concat since Exasol's GROUP_CONCAT genuinely uses the MySQL SEPARATOR keyword.
| "GROUP_CONCAT": lambda self: self._parse_group_concat(), | ||
| # https://docs.exasol.com/db/latest/sql_references/functions/alphabeticallistfunctions/listagg.htm | ||
| "LISTAGG": lambda self: self._parse_string_agg(), |
There was a problem hiding this comment.
Based on the exasol grammar with this fix we can support the listagg_overflow syntax.
In order to achieve this generation, we should change the groupconcat_sql in sqlglot/generators/exasol.py -> TRANSFORMS dict to this:
exp.GroupConcat: lambda self, e: groupconcat_sql(self, e, on_overflow=True),
After this, we should add validate_identity tests for this syntax:
self.validate_identity(
"LISTAGG(x, ',' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY y)"
)
self.validate_identity(
"LISTAGG(x, ',' ON OVERFLOW TRUNCATE '...' WITH COUNT) WITHIN GROUP (ORDER BY y)"
)
self.validate_identity(
"LISTAGG(x, ',' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT) WITHIN GROUP (ORDER BY y)"
)
Also there is a bug with the default delimiter but I will fix it in a following PR.
Routing LISTAGG through STRING_AGG parses the ON OVERFLOW clause, but the Exasol GroupConcat generator dropped it. Pass on_overflow=True so the clause round-trips, and add validate_identity coverage for ERROR / TRUNCATE ... WITH COUNT / TRUNCATE ... WITHOUT COUNT.
|
Thanks @geooo109 — done. I added All three round-trip now, and the existing Exasol suite (20 tests / 582 subtests) plus the Oracle/Snowflake/Presto/Trino/DuckDB dialect suites stay green. I left the default-delimiter bug alone since you mentioned you'd handle it in a follow-up. |
Summary
Exasol mis-parses the second argument of
LISTAGG.FUNCTION_PARSERSroutes bothGROUP_CONCATandLISTAGGthrough the MySQL-style_parse_group_concat, soLISTAGG(expr, sep)folds the separator into the value with aCONCATinstead of treating it as the separator:Every sibling dialect parses the identical SQL correctly (
this=x,separator=','):LISTAGGparser_parse_string_agg_parse_string_agg_parse_string_agg_parse_string_agg_parse_group_concat← bugThe wrong AST is also non-idempotent: re-parsing the emitted SQL nests another
CONCAT.Fix
Route Exasol's
LISTAGGthrough_parse_string_agg(which maps the second argument to the separator), matching the sibling dialects.GROUP_CONCATstays on_parse_group_concat, since Exasol'sGROUP_CONCATgenuinely uses the MySQLSEPARATORkeyword.After the fix, Exasol's
LISTAGGAST is identical to Oracle/Snowflake/Trino/DuckDB and round-trips cleanly.Verification
main(945d1e9d): ExasolLISTAGG(x, ',')parses to aCONCAT-folded AST that differs from all four sibling dialects.test_stringFunctions; both fail before the fix and pass after.==to the Oracle/Snowflake/Trino/DuckDB ASTs for the same SQL, and thatGROUP_CONCATparsing is unchanged.ruffclean.This pull request was prepared with the assistance of AI, under my direction and review.