GH-22232: [C++][Python] Introduce optional default_column_type parameter#47663
GH-22232: [C++][Python] Introduce optional default_column_type parameter#47663vladborovtsov wants to merge 10 commits intoapache:mainfrom
Conversation
|
|
|
|
|
|
1 similar comment
|
|
|
|
|
|
|
@github-actions crossbow submit preview-docs |
|
|
|
|
|
|
Thank you @vladborovtsov for the contribution. |
|
|
|
Hi @AlenkaF |
|
Happy to see a response! I will wait for an opinion from a C++ dev and in the meantime try to look at the Python part. |
|
Hi @AlenkaF |
There was a problem hiding this comment.
I only read the c++ part. I think the way option is defined and checked is good.
A nit: I'd move c++ tests into arrow/cpp/src/arrow/csv/column_builder_test.cc and use available machinery there. See how CheckInferred is used here:
arrow/cpp/src/arrow/csv/column_builder_test.cc
Lines 441 to 457 in ad9279e
Edit: column_builder_test.cc is checking behavior only on single columns, perhaps keep a couple multi-column tests in the current location to test for that.
|
There is not too much to add into I’ve added a guard test to ensure that my changes don’t override the type inference logic. Not really sure if it it makes sense to have them there. As for the reader tests, I’ve tried to keep them representative while minimizing the amount of lines with recent changes. |
raulcd
left a comment
There was a problem hiding this comment.
I am unsure about the approach as I am not an expert on the C++ side of the CSV reader. I'll investigate a little. In the meantime, could you fix the related lint failure and rebase main to fix some of the CI issues?
Rationale for this change
Add an optional default_column_type parameter to the CSV reading API (C++ and Python) to provide a fallback type when per-column types aren’t specified, improving schema consistency and complementing the existing column_types logic.
What changes are included in this PR?
Are these changes tested?
Yes. Existing and new tests are passing.
C++:
pyarrow:
New tests are passing.
Are there any user-facing changes?
I believe this change is backward compatible. Parameter is optional and its default value doesn't change the existing behavior; All the existing rests are passing.
Maybe relevant: #22232
Relates to #47502
GitHub Issue: [C++] CSV reader: add a default column type (or sentinel mapping) to avoid per-column enumeration #47502
GitHub Issue: [C++] CSV reader: Ability to not infer column types. #22232