Skip to content

MDEV-39005: Assertion failure on hint-forced Split-Materialized plan#5279

Open
spetrunia wants to merge 1 commit into
12.3from
12.3-MDEV-39005-split-mat-variant2
Open

MDEV-39005: Assertion failure on hint-forced Split-Materialized plan#5279
spetrunia wants to merge 1 commit into
12.3from
12.3-MDEV-39005-split-mat-variant2

Conversation

@spetrunia

Copy link
Copy Markdown
Member

The problem: consider a query in form:
t1 JOIN (SELECT a FROM t2 GROUP BY a) dt ON dt.a=t1.a

After fix for MDEV-30877, the optimizer computes the number of GROUP BY groups that we will have in the materialized derived table. It is saved into TABLE(dt)->used_stat_records and propagates into TABLE(dt)->opt_range_condition_rows.

On the other hand, the Split-Materialized optimization tries to compute the number of rows in the temp. table when we're running in the the Split-Materialized mode and only populate the matching groups' rows there.

These numbers may come into disagreement: Split-Materialized may produce a higher number, save it into JOIN_TAB::records and cause an assertion failure in cardinality code due to having

table->opt_range_condition_rows > join_tab->records

Split-Materialized code's computations omit a lot, like the whole grouping operation.
For now, use a trivial fix: make sure Split-Materialized's estimate of JOIN_TAB::records doesn't exceed the derived table size.

The problem: consider a query in form:
  t1 JOIN (SELECT a FROM t2 GROUP BY a) dt ON dt.a=t1.a

After fix for MDEV-30877, the optimizer computes the number of GROUP BY
groups that we will have in the materialized derived table. It is saved
into TABLE(dt)->used_stat_records and propagates into
TABLE(dt)->opt_range_condition_rows.

On the other hand, the Split-Materialized optimization tries to compute
the number of rows in the temp. table when we're running in the the
Split-Materialized mode and only populate the matching groups' rows there.

These numbers may come into disagreement: Split-Materialized may produce
a higher number, save it into JOIN_TAB::records and cause an assertion
failure in cardinality code due to having

  table->opt_range_condition_rows > join_tab->records

Split-Materialized code's computations omit a lot, like the whole grouping
operation.
For now, use a trivial fix: make sure Split-Materialized's estimate of
JOIN_TAB::records doesn't exceed the derived table size.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39005 to fix an assertion failure on hint-forced Split-Materialized plans. It updates the record count estimation in sql/opt_split.cc by using the minimum of the record count and the used stat records, and adds corresponding test cases. The review feedback correctly points out a bug where join->join_record_count refers to the parent join instead of the child join, and suggests using spl_opt_info->unsplit_card instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/opt_split.cc
Comment on lines +1283 to +1284
records= (ha_rows)MY_MIN(join->join_record_count,
table->used_stat_records);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment above states that we want to use the child join's join_record_count (the number of records in the derived table before the GROUP BY operation). However, in choose_best_splitting, join refers to the parent join object, meaning join->join_record_count evaluates to the parent join's record count instead of the child join's.

To correctly access the child join's record count, we should use spl_opt_info->unsplit_card, which was populated with the child join's join_record_count in add_keyuses_for_splitting().

    records= (ha_rows)MY_MIN(spl_opt_info->unsplit_card,
                             table->used_stat_records);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini, look at the first lines in

{
  SplM_opt_info *spl_opt_info= table->spl_opt_info;
  DBUG_ASSERT(spl_opt_info != NULL);
  JOIN *join= spl_opt_info->join;

The join comes from spl_opt_info->join.
This one is set in bool JOIN::check_for_splittable_materialized() :

  spl_opt_info->join= this;

so it refers to the child join, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't answer if there is a comma after "/gemini", it does for messages starting with "/gemini " (space instead of comma)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini look at the first lines in

{
SplM_opt_info *spl_opt_info= table->spl_opt_info;
DBUG_ASSERT(spl_opt_info != NULL);
JOIN *join= spl_opt_info->join;
The join comes from spl_opt_info->join.
This one is set in bool JOIN::check_for_splittable_materialized() :

spl_opt_info->join= this;
so it refers to the child join, right?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. I apologize for the confusion in my previous comment. Since spl_opt_info->join is explicitly set to this (the child join) in JOIN::check_for_splittable_materialized(), join->join_record_count does indeed refer to the child join's record count. Thank you for the clarification.

groups_20_2 int not null,
b int,
primary key (groups_20, groups_20_2)
) engine=innodb;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce the bug on MyISAM, do we really need InnoDB here?

@mariadb-OlegSmirnov

Copy link
Copy Markdown
Contributor

The commit message says " assertion failure in cardinality code due to having table->opt_range_condition_rows > join_tab->records" but isn't it the opposite? Before the patch join_tab->records == 10, after the patch join_tab->records == 2 and table->opt_range_condition_rows == 2 in both cases.

@mariadb-OlegSmirnov

Copy link
Copy Markdown
Contributor

Another point regarding the commit message: "trivial fix: make sure ... JOIN_TAB::records doesn't exceed the derived table size" - this looks more like a description of the previous version of the patch where we only clamped the number of records. I think your patch is less trivial, worth describing a bit wider

@Olernov Olernov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants