Fix as_dataarray to apply coords parameter for DataArray input#551
Open
FBumann wants to merge 13 commits intoPyPSA:masterfrom
Open
Fix as_dataarray to apply coords parameter for DataArray input#551FBumann wants to merge 13 commits intoPyPSA:masterfrom
FBumann wants to merge 13 commits intoPyPSA:masterfrom
Conversation
Previously, when a DataArray was passed to as_dataarray(), the coords parameter was silently ignored. This was inconsistent with other input types (numpy, pandas) where coords are applied. Now, when coords is provided as a dict and the input is a DataArray, the function will reindex the array to match the provided coordinates. This ensures consistent behavior across all input types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2. Expands to new dims from coords (broadcast) Summary: - as_dataarray now consistently applies coords for all input types - DataArrays with fewer dims are expanded to match the full coords specification - This fixes the inconsistency when creating variables with DataArray bounds
Collaborator
Author
|
Maybe remove explicit broadcast from masking added in a prior patch |
Replace reindex with a strict equality check for DataArray inputs. Silent reindexing is dangerous as it introduces NaNs for missing indices and drops unmatched ones, masking user bugs. Now raises ValueError if coords don't match, while still allowing expand_dims for broadcasting to new dimensions.
Strict by default: raises ValueError if a DataArray has dimensions not present in coords. Call sites that need broadcasting (multiply, dot, add) opt in with allow_extra_dims=True. Structural call sites like add_variables bounds/mask remain strict.
When coords is a sequence (e.g. from add_variables), convert it to a dict using dims or Index names so the same validation applies. This closes the gap where sequence coords were silently ignored for DataArray inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unifies the sequence-to-dict coords conversion used in pandas_to_dataarray, numpy_to_dataarray, and the DataArray branch of as_dataarray into a single helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the coord validation and broadcasting behavior from the user perspective.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix as_dataarray to validate coords for DataArray input
Problem
When
as_dataarray()received a DataArray with acoordsparameter, the coords were silently ignored. This was inconsistent with all other input types (numpy, pandas, scalar) where coords are applied. Users passing DataArray bounds toadd_variableswith explicit coords would get no validation — mismatched coordinates would silently produce wrong results.Solution
Add coordinate validation for DataArray inputs in
as_dataarray():ValueErroron mismatch (no silent reindexing, which would introduce NaNs)ValueErrorif the DataArray has dims not incoordscoordsbut not in the DataArray are added viaexpand_dimscoords=Noneallow_extra_dimsflagCall sites that need xarray broadcasting for arithmetic opt in with
allow_extra_dims=True:allow_extra_dimsadd_variables(bounds, mask)False(default)_multiply_by_constantTrueVariable.to_linexprTruedotTruePerformance
We introduce a new validation call on every as_dataarray call that passes a dataarray (ignored before this PR)
This should be fine performance wise imo and worth it.
Refactoring
_coords_to_mapping(): Normalizes coords (sequence or mapping) to a dict. Replaces duplicateddict(zip(dims, coords))inpandas_to_dataarray,numpy_to_dataarray, and the new DataArray branch._validate_dataarray_coords(): Standalone validation function — normalizes coords, checks for extra dims, validates shared dim coords, expands missing dims. Single iteration over expected coords.Incosistencies
Examples
Checklist