Skip to content

fix(exasol): treat LISTAGG second argument as the separator CLAUDE#7786

Open
gaoflow wants to merge 3 commits into
tobymao:mainfrom
gaoflow:fix-exasol-listagg-separator
Open

fix(exasol): treat LISTAGG second argument as the separator CLAUDE#7786
gaoflow wants to merge 3 commits into
tobymao:mainfrom
gaoflow:fix-exasol-listagg-separator

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Exasol mis-parses the second argument of LISTAGG. FUNCTION_PARSERS routes both GROUP_CONCAT and LISTAGG through the MySQL-style _parse_group_concat, so LISTAGG(expr, sep) folds the separator into the value with a CONCAT instead of treating it as the separator:

>>> import sqlglot
>>> sqlglot.parse_one("SELECT LISTAGG(x, ',') WITHIN GROUP (ORDER BY y) FROM t", read="exasol")
# GroupConcat(this=Concat(x, ','), separator=None)   <-- wrong

Every sibling dialect parses the identical SQL correctly (this=x, separator=','):

dialect LISTAGG parser
Oracle _parse_string_agg
Snowflake _parse_string_agg
Trino _parse_string_agg
DuckDB _parse_string_agg
Exasol _parse_group_concat ← bug

The wrong AST is also non-idempotent: re-parsing the emitted SQL nests another CONCAT.

Fix

Route Exasol's LISTAGG through _parse_string_agg (which maps the second argument to the separator), matching the sibling dialects. GROUP_CONCAT stays on _parse_group_concat, since Exasol's GROUP_CONCAT genuinely uses the MySQL SEPARATOR keyword.

After the fix, Exasol's LISTAGG AST is identical to Oracle/Snowflake/Trino/DuckDB and round-trips cleanly.

Verification

  • Reproduced on main (945d1e9d): Exasol LISTAGG(x, ',') parses to a CONCAT-folded AST that differs from all four sibling dialects.
  • Added a round-trip identity test and a cross-dialect read test in test_stringFunctions; both fail before the fix and pass after.
  • Confirmed the fixed Exasol AST is == to the Oracle/Snowflake/Trino/DuckDB ASTs for the same SQL, and that GROUP_CONCAT parsing is unchanged.
  • Full test suite: 1126 passed, 18227 subtests, no regressions. ruff clean.

This pull request was prepared with the assistance of AI, under my direction and review.

Exasol routed both GROUP_CONCAT and LISTAGG through the MySQL-style
_parse_group_concat, so LISTAGG(expr, sep) folded its second argument
into the value via CONCAT instead of treating it as the separator. This
produced a semantically wrong AST (separator duplicated into the value,
non-idempotent on re-parse) that disagreed with every sibling dialect.

Route LISTAGG through _parse_string_agg, matching Oracle, Snowflake,
Trino and DuckDB. GROUP_CONCAT keeps _parse_group_concat since Exasol's
GROUP_CONCAT genuinely uses the MySQL SEPARATOR keyword.
Comment thread sqlglot/parsers/exasol.py
Comment on lines +122 to +124
"GROUP_CONCAT": lambda self: self._parse_group_concat(),
# https://docs.exasol.com/db/latest/sql_references/functions/alphabeticallistfunctions/listagg.htm
"LISTAGG": lambda self: self._parse_string_agg(),

@geooo109 geooo109 Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the exasol grammar with this fix we can support the listagg_overflow syntax.

In order to achieve this generation, we should change the groupconcat_sql in sqlglot/generators/exasol.py -> TRANSFORMS dict to this:

exp.GroupConcat: lambda self, e: groupconcat_sql(self, e, on_overflow=True),

After this, we should add validate_identity tests for this syntax:

self.validate_identity(
    "LISTAGG(x, ',' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY y)"
)
self.validate_identity(
    "LISTAGG(x, ',' ON OVERFLOW TRUNCATE '...' WITH COUNT) WITHIN GROUP (ORDER BY y)"
)
self.validate_identity(
    "LISTAGG(x, ',' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT) WITHIN GROUP (ORDER BY y)"
)

Also there is a bug with the default delimiter but I will fix it in a following PR.

Routing LISTAGG through STRING_AGG parses the ON OVERFLOW clause, but the
Exasol GroupConcat generator dropped it. Pass on_overflow=True so the clause
round-trips, and add validate_identity coverage for ERROR / TRUNCATE ... WITH
COUNT / TRUNCATE ... WITHOUT COUNT.
@gaoflow

gaoflow commented Jun 23, 2026

Copy link
Copy Markdown
Author

Thanks @geooo109 — done. I added on_overflow=True to the Exasol GroupConcat transform so the ON OVERFLOW clause survives generation, and added the three validate_identity cases you suggested:

LISTAGG(x, ',' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY y)
LISTAGG(x, ',' ON OVERFLOW TRUNCATE '...' WITH COUNT) WITHIN GROUP (ORDER BY y)
LISTAGG(x, ',' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT) WITHIN GROUP (ORDER BY y)

All three round-trip now, and the existing Exasol suite (20 tests / 582 subtests) plus the Oracle/Snowflake/Presto/Trino/DuckDB dialect suites stay green. I left the default-delimiter bug alone since you mentioned you'd handle it in a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants