Skip to content

MDEV-38144: update Optional_metadata_fields to use new aggregate per-column struct#4720

Open
EricHayter wants to merge 27 commits into
MariaDB:mainfrom
EricHayter:mdev-38144
Open

MDEV-38144: update Optional_metadata_fields to use new aggregate per-column struct#4720
EricHayter wants to merge 27 commits into
MariaDB:mainfrom
EricHayter:mdev-38144

Conversation

@EricHayter

Copy link
Copy Markdown

Currently Optional_metadata_fields has many members that use classes from the C++ standard library, most notably the use of std::vector and std::string. These members have been updated to use existing MariaDB types instead.

@EricHayter

Copy link
Copy Markdown
Author

Running .clangformat seemed to add lots of line changes. Working on providing a separate commit just with the main changes to make review easier.

@EricHayter EricHayter force-pushed the mdev-38144 branch 2 times, most recently from 1c7c5f4 to 18a122c Compare March 2, 2026 20:08
@grooverdan grooverdan requested a review from bnestere March 2, 2026 21:35
@grooverdan

Copy link
Copy Markdown
Member

Running .clangformat seemed to add lots of line changes. Working on providing a separate commit just with the main changes to make review easier.

I think it was determined that the coding standard wasn't acheivable with clang-format. I'm not sure if worth the effort.

@grooverdan grooverdan added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 2, 2026

@bnestere bnestere left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EricHayter !

Thank you for submitting the PR! It is good to update our types to be consistent with MariaDB types. One thing this PR misses, that is in the description of MDEV-38144, is that rather than having an array per field, it would be better to have one array that has elements of a complex type which describe a field.

Previously, another contributor had prepared a patch, PR #4494, which looked to satisfy this; however, it was never thoroughly reviewed because they never got it to compile / pass tests on our CI system. I'd suggest looking through it for some inspiration (though I'm not sure of the quality of it).

Something the previous patch didn't try (though the JIRA mentions) is to use the Column_definition type defined in sql/field.h. That would be another improvement.

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

LGTM after reverting the space-only changes.

Please keep working with Brandon on the final review.

Comment thread sql/sql_array.h
Comment thread sql/sql_array.h
size_t old_size= elements();
if (reserve(new_size))
return true;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid space only changes.

@EricHayter

Copy link
Copy Markdown
Author

Hi @EricHayter !

Thank you for submitting the PR! It is good to update our types to be consistent with MariaDB types. One thing this PR misses, that is in the description of MDEV-38144, is that rather than having an array per field, it would be better to have one array that has elements of a complex type which describe a field.

Previously, another contributor had prepared a patch, PR #4494, which looked to satisfy this; however, it was never thoroughly reviewed because they never got it to compile / pass tests on our CI system. I'd suggest looking through it for some inspiration (though I'm not sure of the quality of it).

Something the previous patch didn't try (though the JIRA mentions) is to use the Column_definition type defined in sql/field.h. That would be another improvement.

Noted. I'll take a look at that previous PR and update mine accordingly. Thanks.

Comment thread sql/log_event.cc
Currently Optional_metadata_fields has many members that use classes
from the C++ standard library, most notably the use of std::vector and
std::string. These members have been updated to use existing MariaDB
types instead.
Beginning to create a per-field aggregate struct for all of the possibly
required metadata values. Slowly going to start incorporating the
existing vectors into this.
@EricHayter EricHayter changed the title MDEV-38144: update Optional_metadata_fields to use MariaDB types MDEV-38144: update Optional_metadata_fields to use new aggregate per-column struct Mar 14, 2026
@EricHayter

Copy link
Copy Markdown
Author

I've made some changes to the PR introducing a new struct (Column_metadata) to store the metadata for each column. I did initially investigate using the existing Column_definition as mentioned by @bnestere earlier, but including the headers from the server code (which contains the definition for Column_definition) seemed to create some clashing with macros from the client code which seemed a bit too sharp for me to be changing.

Please note that this newer version of the PR does end up changing more code as the logic for parsing the optional metadata data is a bit more involved with this new struct.

I've also updated the PR title to be representative of the change.

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep a single commit.

@ParadoxV5 ParadoxV5 removed their request for review March 21, 2026 00:02
@ParadoxV5 ParadoxV5 dismissed their stale review March 21, 2026 00:02

Obsolete

Wasn't using the fact that the default charset and charset fields were mutually exclusive. Also removed the need to use optionals anywhere in the optional metadata stuff. Just filling the char columns with the charset first and then overrriding later...
@EricHayter EricHayter requested a review from bnestere May 16, 2026 08:59
Neovim keeps stripping end of line whitespace. I've whitelisted the repo from now to avoid this happening again
@EricHayter

Copy link
Copy Markdown
Author

I've added the appropriate test cases that you requested for the bug fix. Code should be ready for review again.

@bnestere bnestere left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EricHayter for the changes! Sorry it took a bit until I had time to review this.

This patch is looking really good, I've got a few more notes. if you're able to address them in the next few days, we can likely get this into our upcoming 13.1 preview release. No problem if not

Comment thread sql/sql_array.h Outdated
Comment thread sql/log_event.cc Outdated
Comment thread mysql-test/suite/binlog/t/binlog_signedness_parse.test Outdated
Comment thread mysql-test/suite/binlog/t/binlog_signedness_parse.test
Comment thread sql/log_event.cc Outdated
Comment thread sql/log_event.cc
@bnestere

Copy link
Copy Markdown
Contributor

@EricHayter one more thought, it looks like the files still include the C++ libraries, but I don't think they're needed any longer (e.g. <vector>). Can you remove unused headers with this too?

@EricHayter

Copy link
Copy Markdown
Author

@bnestere everything seems pretty straightforward here. Should be able to fix these up by EOD.

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please fix the compilation error the buildbot is reporting?

/home/buildbot/amd64-debian-11-debug-ps-embedded/build/sql/log_event_server.cc: In member function ‘bool Table_map_log_event::init_charset_field(bool (*)(Binlog_type_info*, Field*), Table_map_log_event::Optional_metadata_field_type, Table_map_log_event::Optional_metadata_field_type)’:
/home/buildbot/amd64-debian-11-debug-ps-embedded/build/sql/log_event_server.cc:7009:8: error: ‘map’ is not a member of ‘std’
 7009 |   std::map<uint, uint> collation_map;
      |        ^~~
/home/buildbot/amd64-debian-11-debug-ps-embedded/build/sql/log_event_server.cc:64:1: note: ‘std::map’ is defined in header ‘<map>’; did you forget to ‘#include <map>’?
   63 | #include "zlib.h"
  +++ |+#include <map>
   64 | 

@EricHayter

Copy link
Copy Markdown
Author

Can you please fix the compilation error the buildbot is reporting?

/home/buildbot/amd64-debian-11-debug-ps-embedded/build/sql/log_event_server.cc: In member function ‘bool Table_map_log_event::init_charset_field(bool (*)(Binlog_type_info*, Field*), Table_map_log_event::Optional_metadata_field_type, Table_map_log_event::Optional_metadata_field_type)’:
/home/buildbot/amd64-debian-11-debug-ps-embedded/build/sql/log_event_server.cc:7009:8: error: ‘map’ is not a member of ‘std’
 7009 |   std::map<uint, uint> collation_map;
      |        ^~~
/home/buildbot/amd64-debian-11-debug-ps-embedded/build/sql/log_event_server.cc:64:1: note: ‘std::map’ is defined in header ‘<map>’; did you forget to ‘#include <map>’?
   63 | #include "zlib.h"
  +++ |+#include <map>
   64 | 

Sorry about that. I seemed to have some different compile flags when compiling locally (was building fine on my system). Should be fixed now.

@bnestere bnestere left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for the 13.1 preview release.

To explain our release process, the first public release we do is a preview release, which is when our testers do QA testing for new features. For 13.1, this is from when the preview release is assembled (probably in the next few days) through July 28. If our testers find any issues with this feature in this time, they must be fixed by July 28 to qualify for the RC release. Will you be generally available in this time to fix any possible issues?

For this preview release, I've:

  1. Squashed your commits into a single commit & commit message (97846b6).
  2. Applied a follow-up commit with some small fixes (d5cbd5a).

My changes won't be merged into main. Before the RC release, please do that yourself on this PR (as you see fit) and I will use your version for the official patch (which will get merged into main).

BTW, a bonus before the RC deadline (July 28) would be to replace the std::map in log_event_server.cc , which needed you to still need the <map> type.

Thanks for all your work on this, @EricHayter !

@EricHayter

Copy link
Copy Markdown
Author

Sounds great!

I will be available if anything does come up, I'll clean up the PR into a singular commit sometime soon. That being said, I couldn't quite parse what you were saying relating to the bonus bit there about std::map.

@bnestere

Copy link
Copy Markdown
Contributor

@EricHayter

log_event_server.cc has the following function that uses std::map

bool Table_map_log_event::init_charset_field(
    bool (* include_type)(Binlog_type_info *, Field *),
    Optional_metadata_field_type default_charset_type,
    Optional_metadata_field_type column_charset_type)
{
  DBUG_EXECUTE_IF("simulate_init_charset_field_error", return true;);

  std::map<uint, uint> collation_map;

yet we have our own internal Hash_map / HASH type which should be used instead for consistency.

This is why your patch couldn't remove #include <map>

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now in testing.

@knielsen knielsen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, thanks! I only took a quick look through, I'm not that
familiar with the optional_metadata_fields stuff, but IIUC, Brandon knows
about that.

Just a few minor comments:

> diff --git a/mysql-test/suite/binlog/t/binlog_signedness_parse.test b/mysql-test/suite/binlog/t/binlog_signedness_parse.test
> new file mode 100644
> index 00000000000..061c54a65be
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_signedness_parse.test

> +SET GLOBAL binlog_row_metadata = FULL;

> +SET GLOBAL binlog_row_metadata = NO_LOG;

Normally in test cases we do:

SET @old_metadata= @@GLOBAL.binlog_row_metadata;
SET GLOBAL binlog_row_metadata = FULL;
...
SET GLOBAL binlog_row_metadata = @old_metadata;

This way, the test will also restore the environment correctly if the test
suite is run with eg. ./mtr --mysqld=--binlog-row-metadata=NO_LOG.
(But it's not a huge issue).

> diff --git a/sql/log_event.cc b/sql/log_event.cc
> index 30e5ab7eb51..74e60b292a6 100644
> --- a/sql/log_event.cc
> +++ b/sql/log_event.cc
> @@ -97,8 +97,8 @@ TYPELIB binlog_checksum_typelib= CREATE_TYPELIB_FOR(binlog_checksum_type_names);
>     The checksum-aware servers extract FD's version to decide whether the FD event
>     carries checksum info.
>  
> -   TODO: correct the constant when it has been determined 
> -   (which main tree to push and when) 
> +   TODO: correct the constant when it has been determined
> +   (which main tree to push and when)
>  */
>  const Version checksum_version_split_mysql(5, 6, 1);
>  const Version checksum_version_split_mariadb(5, 3, 0);
> @@ -405,8 +405,8 @@ query_event_uncompress(const Format_description_log_event *description_event,
>      *newlen+= BINLOG_CHECKSUM_LEN;
>    
>    uint32 alloc_size= (uint32)ALIGN_SIZE(*newlen);
> -  
> -  if (alloc_size <= buf_size) 
> +
> +  if (alloc_size <= buf_size)
>      new_dst= buf;
>    else 
>    {
> @@ -3534,28 +3534,28 @@ bool Annotate_rows_log_event::is_valid() const
>  
>  /**
>    @page How replication of field metadata works.
> -  
> -  When a table map is created, the master first calls 
> -  Table_map_log_event::save_field_metadata() which calculates how many 
> -  values will be in the field metadata. Only those fields that require the 
> -  extra data are added. The method also loops through all of the fields in 
> +
> +  When a table map is created, the master first calls
> +  Table_map_log_event::save_field_metadata() which calculates how many
> +  values will be in the field metadata. Only those fields that require the
> +  extra data are added. The method also loops through all of the fields in
>    the table calling the method Field::save_field_metadata() which returns the
>    values for the field that will be saved in the metadata and replicated to
>    the slave. Once all fields have been processed, the table map is written to
>    the binlog adding the size of the field metadata and the field metadata to
>    the end of the body of the table map.
>  
> -  When a table map is read on the slave, the field metadata is read from the 
> -  table map and passed to the table_def class constructor which saves the 
> -  field metadata from the table map into an array based on the type of the 
> -  field. Field metadata values not present (those fields that do not use extra 
> -  data) in the table map are initialized as zero (0). The array size is the 
> +  When a table map is read on the slave, the field metadata is read from the
> +  table map and passed to the table_def class constructor which saves the
> +  field metadata from the table map into an array based on the type of the
> +  field. Field metadata values not present (those fields that do not use extra
> +  data) in the table map are initialized as zero (0). The array size is the
>    same as the columns for the table on the slave.
>  
>    Additionally, values saved for field metadata on the master are saved as a 
>    string of bytes (uchar) in the binlog. A field may require 1 or more bytes
> -  to store the information. In cases where values require multiple bytes 
> -  (e.g. values > 255), the endian-safe methods are used to properly encode 
> +  to store the information. In cases where values require multiple bytes
> +  (e.g. values > 255), the endian-safe methods are used to properly encode
>    the values on the master and decode them on the slave. When the field
>    metadata values are captured on the slave, they are stored in an array of
>    type uint16. This allows the least number of casts to prevent casting bugs

Avoid such unrelated changes (here whitespace-only changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

6 participants