Skip to content

FIX: Request SQL_CHAR as SQL_C_WCHAR in arrow fetch path#575

Open
ffelixg wants to merge 3 commits into
microsoft:mainfrom
ffelixg:arrow_char_to_wchar
Open

FIX: Request SQL_CHAR as SQL_C_WCHAR in arrow fetch path#575
ffelixg wants to merge 3 commits into
microsoft:mainfrom
ffelixg:arrow_char_to_wchar

Conversation

@ffelixg
Copy link
Copy Markdown
Contributor

@ffelixg ffelixg commented May 13, 2026

Work Item / Issue Reference

AB#44922

GitHub Issue: #553


Summary

Due to #495, we can now request SQL_CHAR data as SQL_C_WCHAR, i.e. utf16le strings. Doing this for the arrow path ensures that arrow methods always return correct data no matter the encoding settings / locale / operating system. There does not seem to be any significant negative performance impact.

Copilot AI review requested due to automatic review settings May 13, 2026 11:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Arrow fetch path in the C++ pybind layer to always request SQL_CHAR/SQL_VARCHAR data as SQL_C_WCHAR (UTF-16) so Arrow results are correct regardless of server/client codepage, locale, or platform—addressing the VARCHAR non-ASCII decoding issues reported in #553.

Changes:

  • Switch Arrow batch binding/fetching for SQL_CHAR/SQL_VARCHAR from SQL_C_CHAR to SQL_C_WCHAR to avoid codepage-dependent decoding.
  • Remove the narrow-char copy path for SQL_CHAR/SQL_VARCHAR in Arrow batch production and route through the existing wide-char → UTF-8 conversion logic.
  • Add an Arrow regression test covering Unicode round-tripping through a UTF-8-collated VARCHAR column.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
mssql_python/pybind/ddbc_bindings.cpp Changes Arrow batch binding and fetch handling so VARCHAR is requested as SQL_C_WCHAR, ensuring consistent Unicode correctness.
tests/test_004_cursor_arrow.py Adds a regression test to validate Arrow output for Unicode data stored in VARCHAR with UTF-8 collation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_004_cursor_arrow.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants