Tests | Add collation transcoding tests#4051
Conversation
This previously only tested Kazakh_90_CI_AI and Georgian_Modern_Sort_CI_AS. The replacement tests every collation. It also performs a more comprehensive check that the string/byte[] roundtrips with the varbinary/varchar from the database instance. Finally, it no longer requires permission to drop and create databases: we can just use the COLLATE statement.
| [InlineData("KAZAKH_90_CI_AI")] | ||
| [InlineData("Georgian_Modern_Sort_CI_AS")] | ||
| public static void CollatedDataReaderTest(string collation) | ||
| [ConditionalFact(typeof(DataTestUtility), nameof(DataTestUtility.AreConnStringsSetup))] |
There was a problem hiding this comment.
Although this is treated as a Fact, it'll operate over every known collation. There are >5000 of these in SQL 2025 and I wasn't convinced that having each of them as a test case was valuable.
| using (SqlCommand dbCmd = dbCon.CreateCommand()) | ||
| { | ||
| string data = Guid.NewGuid().ToString(); | ||
| Assert.True(codePageEncoding is not null, |
There was a problem hiding this comment.
This flows from the decision to keep the test as a Fact rather than a Theory: I need to specify which collation is failing an assertion using a message, and Assert.NotNull doesn't have this.
The same pattern plays out for the other instances of Assert.True.
| Assert.True(collatedStringBytes.AsSpan().SequenceEqual(clientSideStringBytes), | ||
| $@"Collation ""{collationName}"", LCID {lcid}, code page {codePageId}: server-supplied byte array does not match client-side encoded bytes."); | ||
|
|
||
| // The character é does not exist in the Cyrillic character set, so do not compare the |
There was a problem hiding this comment.
This note is to explain why we have a different set of assertions between this test and CollatedStringInOutputParameter_DecodesSuccessfully.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #4051 +/- ##
==========================================
- Coverage 74.62% 65.56% -9.07%
==========================================
Files 280 274 -6
Lines 43814 66778 +22964
==========================================
+ Hits 32698 43785 +11087
- Misses 11116 22993 +11877
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR replaces one test and introduces another.
CollatedDataReaderTestPreviously, CollatedDataReaderTest would create a brand new database with one of two collations (Kazakh_90_CI_AI and Georgian_Modern_Sort_CI_AS), then try to select an ASCII string from it. This had a few issues:
I've replaced this with something a little more robust. We now test every collation on a SQL instance, retrieving the character string and its byte representation, then we make sure that they roundtrip. It also uses a string containing the
écharacter; this isn't present in Kazakh_90_CI_AI's code page, so SQL Server converts it toe. This is legitimate behaviour which would otherwise have failed.CollatedStringInOutputParameter_DecodesSuccessfullyThis is a new test which proves that we can roundtrip non-ASCII characters in output parameters when the value's collation's code page represents these non-ASCII characters differently: the default English Windows code page represents
éas0xE9; code page 936 represents this as[0xA8, 0xA6]; and UTF8 represents this as[0xC3, 0xA9].Issues
Loosely related: #584 originally added the CollatedDataReaderTest test, and referred to making sure that the driver would maintain the collation/codepage mappings. These two tests provide sufficient test coverage for that effort (which would also unblock globalization invariant mode.)
Testing
New tests run, the other tests and code remain untouched.