-
-
Notifications
You must be signed in to change notification settings - Fork 2k
MDEV-35879 Slave_heartbeat_period is imprecise #4441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 10.11
Are you sure you want to change the base?
Conversation
`Master_info::heartbeat_period` was previously a `float`, which only has 7 decimal digits of precision on most platforms, which is less than the maximum of 10 its range allows, let alone superfluous user inputs. This commit solves this issue by changing this field to a `DECIMAL(10, 3)`; specifically: * A `uint32_t` of milliseconds internally, * A `my_decimal` of seconds when parsing CHANGE MASTER, * A `double` of seconds when showing to the user, as reads are not affected by external precision issues While here, this commit also corrects the rounding method from “round down” to “nearest millisecond”.
| Value 5.000 | ||
| connection slave; | ||
| change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0009999; | ||
| change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0004999; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I keep the rounding method change out of 10.11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put the whole patch into main (or 12.3), it isn't an issue that any users/customers have complained about, and it doesn't have a broader impact. Better not to put those into older/more stable versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole patch
Can you please clarify: Not just the rounding method change, but also the imprecision (main topic of MDEV-35879)?
Then, perhaps some or all of the points in MDEV-37529 should only be addressed on the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range testing of master_heartbeat_period is duplicated between rpl_heartbeat_basic (runs with mix only) and rpl_heartbeat (tests all three binlog formats).
also, forgot a `;` agAin 🫠
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge nit:
The double interfaces of decimal_t are implemented by printing into a string and parsing that char array.
Even without benchmarks, this already sounds impractical.
| Decimal_from_double(double value): my_decimal() | ||
| { | ||
| int unexpected_error __attribute__((unused))= | ||
| double2my_decimal(E_DEC_ERROR, value, this); | ||
| DBUG_ASSERT(!unexpected_error); | ||
| } | ||
| } MAX_PERIOD= SLAVE_MAX_HEARTBEAT_PERIOD/1000.0, THOUSAND= 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Decimal_from_double(double value): my_decimal() | |
| { | |
| int unexpected_error __attribute__((unused))= | |
| double2my_decimal(E_DEC_ERROR, value, this); | |
| DBUG_ASSERT(!unexpected_error); | |
| } | |
| } MAX_PERIOD= SLAVE_MAX_HEARTBEAT_PERIOD/1000.0, THOUSAND= 1000; | |
| Decimal_from_ll(longlong value): my_decimal() | |
| { | |
| int unexpected_error __attribute__((unused))= | |
| int2my_decimal(E_DEC_ERROR, value, false, this); | |
| DBUG_ASSERT(!unexpected_error); | |
| } | |
| } MAX_PERIOD= SLAVE_MAX_HEARTBEAT_PERIOD/1000, THOUSAND= 1000; |
| decimal_mul(&rounded, &THOUSAND, &decimal_buffer) | | ||
| decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| decimal_mul(&rounded, &THOUSAND, &decimal_buffer) | | |
| decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period)); | |
| /* | |
| (The ideal order would export to a `double` and *then* multiply, | |
| but disappointingly, decimal2double() is implemented | |
| by printing into a string and parsing that char array.) | |
| */ | |
| decimal_mul(&rounded, &THOUSAND, &decimal_buffer) | | |
| decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest avoiding emotion in code comments, and stick to the facts. I.e., instead of
but disappointingly, decimal2double() is implemented by printing into a string and parsing that char array.)
to something like
but as of this patch, decimal2double() is implemented by printing into a string and parsing that char array, which is extra overhead that we can bypass by ...
MDEV-35879 Slave_heartbeat_period is imprecise
* Increase maximum to `std::numeric_limits<uint32_t>::max() ÷ 1000`,
i.e., `4294967.295` *exactly*
* Excludes its additional tests as they are exclusive to that issue fix.
* But not the `0.00049` case, which was *changed*, not *added*.
* Some changes are refactors and so are exclusive to the `main` branch.
* Delete tests in `rpl.rpl_heartbeat`
that duplicates `rpl.rpl_heartbeat_basic`
bnestere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ParadoxV5 !
Thanks for the cleanup! I've left a few notes (and responses to your already existing comments, but I don't think those post as part of the review, IIRC).
| Value 5.000 | ||
| connection slave; | ||
| change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0009999; | ||
| change master to master_host='127.0.0.1',master_port=MASTER_PORT, master_user='root', master_heartbeat_period= 0.0004999; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put the whole patch into main (or 12.3), it isn't an issue that any users/customers have complained about, and it doesn't have a broader impact. Better not to put those into older/more stable versions.
| eval CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=$MASTER_MYPORT, MASTER_USER='root', MASTER_CONNECT_RETRY=$connect_retry, MASTER_HEARTBEAT_PERIOD=0.0009; | ||
| eval CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=$MASTER_MYPORT, MASTER_USER='root', MASTER_CONNECT_RETRY=$connect_retry, MASTER_HEARTBEAT_PERIOD=0.00049; | ||
| SHOW GLOBAL STATUS LIKE 'slave_heartbeat_period'; | ||
| --source include/show_slave_status.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest adding a comment to why you show this twice from different sources.
| || mi->heartbeat_period == 0); | ||
|
|
||
| mi->heartbeat_period= MY_MIN(SLAVE_MAX_HEARTBEAT_PERIOD, | ||
| slave_net_timeout*500ULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is an optimization of (x*1000)/2. I think the compiler should automatically do it though, and it would improve readability to have the full equation. Do you know how to disassemble code? I'd be curious if the generated instructions are the same or not. If not, then we can keep it (though I'd suggest a comment with the full equation).
| if (unlikely(Lex->mi.heartbeat_period > slave_net_timeout)) | ||
| static const struct Decimal_from_double: my_decimal | ||
| { | ||
| Decimal_from_double(double value): my_decimal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder why my_decimal doesn't have this constructor innately? Would it make sense to add it? Though that'd also be a question for the runtime team
| decimal_mul(&rounded, &THOUSAND, &decimal_buffer) | | ||
| decimal2ulonglong(&decimal_buffer, &(Lex->mi.heartbeat_period)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest avoiding emotion in code comments, and stick to the facts. I.e., instead of
but disappointingly, decimal2double() is implemented by printing into a string and parsing that char array.)
to something like
but as of this patch, decimal2double() is implemented by printing into a string and parsing that char array, which is extra overhead that we can bypass by ...
| // decomposed from my_decimal2int() to reduce a bit of computations | ||
| auto rounded= my_decimal(); | ||
| int unexpected_error __attribute__((unused))= | ||
| decimal_round(decimal, &rounded, 3, HALF_UP) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the compiler guarantees the order-of-operations with bitwise OR (i.e., the three functions involved here may run in any order, which is not what you intend, I think). I think this would be better served by &&s, also because the &&'s short-circuit if there's an error in an earlier stage:
!decimal_round && !decimal_mul && !decimal2ulonglong
or
!(decimal_round || decimal_mul || decimal2ulonglong)
I think that it reads better as well (either way), as it is clear you are working with a single boolean outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call; also affects #4430
Description
Master_info::heartbeat_periodwas previously afloat, which only has 7 decimal digits of precision on most platforms, which is less than the maximum of 10 its range allows, let alone superfluous user inputs.This commit solves this issue by changing this
field to a
DECIMAL(10, 3); specifically:uint32_tof milliseconds internally,my_decimalof seconds when parsing CHANGE MASTER,doubleof seconds when showing to the user, as reads are not affected by external precision issuesWhile here, this commit also corrects the rounding method from “round down” to “nearest millisecond”.
Release Notes
Fix
CHANGE MASTER TO master_heartbeat_periodimprecision for large valuesHow can this PR be tested?
This commit adds another test value and additional queries to
rpl.rpl_heartbeat_basic.PR quality check
This is a new feature or a refactoring, and the PR is based against themainbranch.