Skip to content
/ server Public

Conversation

@ParadoxV5
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-35879

Description

Master_info::heartbeat_period was previously a float, which only has 7 decimal digits of precision on most platforms, which is less than the maximum of 10 its range allows, let alone superfluous user inputs.

This commit solves this issue by changing this
field to a DECIMAL(10, 3); specifically:

  • A uint32_t of milliseconds internally,
  • A my_decimal of seconds when parsing CHANGE MASTER,
  • A double of seconds when showing to the user, as reads are not affected by external precision issues

While here, this commit also corrects the rounding method from “round down” to “nearest millisecond”.

Release Notes

Fix CHANGE MASTER TO master_heartbeat_period imprecision for large values

How can this PR be tested?

This commit adds another test value and additional queries to rpl.rpl_heartbeat_basic.

PR quality check

  • This is a new feature or a refactoring, and the PR is based against the main branch.
    • This commit does not move code.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

`Master_info::heartbeat_period` was previously a `float`, which only has
7 decimal digits of precision on most platforms, which is less than the
maximum of 10 its range allows, let alone superfluous user inputs.

This commit solves this issue by changing this
field to a `DECIMAL(10, 3)`; specifically:
* A `uint32_t` of milliseconds internally,
* A `my_decimal` of seconds when parsing CHANGE MASTER,
* A `double` of seconds when showing to the user,
  as reads are not affected by external precision issues

While here, this commit also corrects the rounding
method from “round down” to “nearest millisecond”.
@ParadoxV5 ParadoxV5 added MariaDB Corporation Replication Patches involved in replication labels Nov 17, 2025
Value 5.000
connection slave;
change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0009999;
change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0004999;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I keep the rounding method change out of 10.11?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put the whole patch into main (or 12.3), it isn't an issue that any users/customers have complained about, and it doesn't have a broader impact. Better not to put those into older/more stable versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole patch

Can you please clarify: Not just the rounding method change, but also the imprecision (main topic of MDEV-35879)?

Then, perhaps some or all of the points in MDEV-37529 should only be addressed on the main branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range testing of master_heartbeat_period is duplicated between rpl_heartbeat_basic (runs with mix only) and rpl_heartbeat (tests all three binlog formats).

Copy link
Contributor Author

@ParadoxV5 ParadoxV5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge nit:
The double interfaces of decimal_t are implemented by printing into a string and parsing that char array.
Even without benchmarks, this already sounds impractical.

Comment on lines +2272 to +2278
Decimal_from_double(double value): my_decimal()
{
int unexpected_error __attribute__((unused))=
double2my_decimal(E_DEC_ERROR, value, this);
DBUG_ASSERT(!unexpected_error);
}
} MAX_PERIOD= SLAVE_MAX_HEARTBEAT_PERIOD/1000.0, THOUSAND= 1000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Decimal_from_double(double value): my_decimal()
{
int unexpected_error __attribute__((unused))=
double2my_decimal(E_DEC_ERROR, value, this);
DBUG_ASSERT(!unexpected_error);
}
} MAX_PERIOD= SLAVE_MAX_HEARTBEAT_PERIOD/1000.0, THOUSAND= 1000;
Decimal_from_ll(longlong value): my_decimal()
{
int unexpected_error __attribute__((unused))=
int2my_decimal(E_DEC_ERROR, value, false, this);
DBUG_ASSERT(!unexpected_error);
}
} MAX_PERIOD= SLAVE_MAX_HEARTBEAT_PERIOD/1000, THOUSAND= 1000;

Comment on lines +2290 to +2291
decimal_mul(&rounded, &THOUSAND, &decimal_buffer) |
decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decimal_mul(&rounded, &THOUSAND, &decimal_buffer) |
decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period));
/*
(The ideal order would export to a `double` and *then* multiply,
but disappointingly, decimal2double() is implemented
by printing into a string and parsing that char array.)
*/
decimal_mul(&rounded, &THOUSAND, &decimal_buffer) |
decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest avoiding emotion in code comments, and stick to the facts. I.e., instead of

but disappointingly, decimal2double() is implemented by printing into a string and parsing that char array.)

to something like

but as of this patch, decimal2double() is implemented by printing into a string and parsing that char array, which is extra overhead that we can bypass by ...

ParadoxV5 added a commit that referenced this pull request Nov 19, 2025
MDEV-35879 Slave_heartbeat_period is imprecise
* Increase maximum to `std::numeric_limits<uint32_t>::max() ÷ 1000`,
  i.e., `4294967.295` *exactly*
* Excludes its additional tests as they are exclusive to that issue fix.
  * But not the `0.00049` case, which was *changed*, not *added*.
* Some changes are refactors and so are exclusive to the `main` branch.
  * Delete tests in `rpl.rpl_heartbeat`
    that duplicates `rpl.rpl_heartbeat_basic`
Copy link
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ParadoxV5 !

Thanks for the cleanup! I've left a few notes (and responses to your already existing comments, but I don't think those post as part of the review, IIRC).

Value 5.000
connection slave;
change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0009999;
change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0004999;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put the whole patch into main (or 12.3), it isn't an issue that any users/customers have complained about, and it doesn't have a broader impact. Better not to put those into older/more stable versions.

eval CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=$MASTER_MYPORT, MASTER_USER='root', MASTER_CONNECT_RETRY=$connect_retry, MASTER_HEARTBEAT_PERIOD=0.0009;
eval CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=$MASTER_MYPORT, MASTER_USER='root', MASTER_CONNECT_RETRY=$connect_retry, MASTER_HEARTBEAT_PERIOD=0.00049;
SHOW GLOBAL STATUS LIKE 'slave_heartbeat_period';
--source include/show_slave_status.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding a comment to why you show this twice from different sources.

|| mi->heartbeat_period == 0);

mi->heartbeat_period= MY_MIN(SLAVE_MAX_HEARTBEAT_PERIOD,
slave_net_timeout*500ULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is an optimization of (x*1000)/2. I think the compiler should automatically do it though, and it would improve readability to have the full equation. Do you know how to disassemble code? I'd be curious if the generated instructions are the same or not. If not, then we can keep it (though I'd suggest a comment with the full equation).

if (unlikely(Lex->mi.heartbeat_period > slave_net_timeout))
static const struct Decimal_from_double: my_decimal
{
Decimal_from_double(double value): my_decimal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder why my_decimal doesn't have this constructor innately? Would it make sense to add it? Though that'd also be a question for the runtime team

Comment on lines +2290 to +2291
decimal_mul(&rounded, &THOUSAND, &decimal_buffer) |
decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest avoiding emotion in code comments, and stick to the facts. I.e., instead of

but disappointingly, decimal2double() is implemented by printing into a string and parsing that char array.)

to something like

but as of this patch, decimal2double() is implemented by printing into a string and parsing that char array, which is extra overhead that we can bypass by ...

// decomposed from my_decimal2int() to reduce a bit of computations
auto rounded= my_decimal();
int unexpected_error __attribute__((unused))=
decimal_round(decimal, &rounded, 3, HALF_UP) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the compiler guarantees the order-of-operations with bitwise OR (i.e., the three functions involved here may run in any order, which is not what you intend, I think). I think this would be better served by &&s, also because the &&'s short-circuit if there's an error in an earlier stage:

!decimal_round && !decimal_mul && !decimal2ulonglong

or

!(decimal_round || decimal_mul || decimal2ulonglong)

I think that it reads better as well (either way), as it is clear you are working with a single boolean outcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call; also affects #4430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

MariaDB Corporation Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

3 participants