Skip to content

Comments

Fix as_dataarray to apply coords parameter for DataArray input#551

Open
FBumann wants to merge 13 commits intoPyPSA:masterfrom
FBumann:fix/as-dataarray
Open

Fix as_dataarray to apply coords parameter for DataArray input#551
FBumann wants to merge 13 commits intoPyPSA:masterfrom
FBumann:fix/as-dataarray

Conversation

@FBumann
Copy link
Collaborator

@FBumann FBumann commented Jan 20, 2026

Fix as_dataarray to validate coords for DataArray input

Problem

When as_dataarray() received a DataArray with a coords parameter, the coords were silently ignored. This was inconsistent with all other input types (numpy, pandas, scalar) where coords are applied. Users passing DataArray bounds to add_variables with explicit coords would get no validation — mismatched coordinates would silently produce wrong results.

Solution

Add coordinate validation for DataArray inputs in as_dataarray():

  • Shared dimensions must match exactly — raises ValueError on mismatch (no silent reindexing, which would introduce NaNs)
  • Extra dimensions are rejected by default — raises ValueError if the DataArray has dims not in coords
  • Missing dimensions are broadcast — dims in coords but not in the DataArray are added via expand_dims
  • No coords means no validation — DataArrays pass through unchanged when coords=None

allow_extra_dims flag

Call sites that need xarray broadcasting for arithmetic opt in with allow_extra_dims=True:

Call site allow_extra_dims Why
add_variables (bounds, mask) False (default) Extra dims in bounds = shape mismatch bug
_multiply_by_constant True Broadcasting for arithmetic
Variable.to_linexpr True Broadcasting for coefficients
dot True Broadcasting for dot product

Performance

We introduce a new validation call on every as_dataarray call that passes a dataarray (ignored before this PR)
This should be fine performance wise imo and worth it.

Refactoring

  • _coords_to_mapping(): Normalizes coords (sequence or mapping) to a dict. Replaces duplicated dict(zip(dims, coords)) in pandas_to_dataarray, numpy_to_dataarray, and the new DataArray branch.
  • _validate_dataarray_coords(): Standalone validation function — normalizes coords, checks for extra dims, validates shared dim coords, expands missing dims. Single iteration over expected coords.

Incosistencies

  • The coord validation is also done for pandas to dataarray, but there only a warning is raised instead of a ValueError
  • broadcast mask currently has a FutureWarning and handles broadcasting differently. For missing dims it's consistent with as_dataarray. But it doesnt raise on misaligned coords. It fills with False. THis behaviour is kept (see 1bab496)

Examples

# Mismatched coords → ValueError
lower = xr.DataArray([0, 0, 0], dims=["time"], coords={"time": [0, 1, 2]})
m.add_variables(lower=lower, coords=[pd.RangeIndex(5, name="time")], name="x")
# ValueError: Coordinates for dimension 'time' do not match

# Extra dims → ValueError
da = xr.DataArray([[1, 2], [3, 4]], dims=["x", "y"])
as_dataarray(da, coords={"x": [0, 1]})
# ValueError: DataArray has extra dimensions not in coords: {'y'}

# Subset dims → broadcast
lower = xr.DataArray([0, 0], dims=["x"], coords={"x": [0, 1]})
m.add_variables(lower=lower, coords={"x": [0, 1], "y": [0, 1, 2]}, name="x")
# works, lower broadcast to shape (2, 3)

# No coords → pass through unchanged
as_dataarray(da)  # no validation, zero overhead

Checklist

  • Code changes are sufficiently documented
  • Unit tests pass (2077 passed)
  • A note for the release notes
  • I consent to the release of this PR's code under the MIT license

FBumann and others added 2 commits January 20, 2026 20:26
Previously, when a DataArray was passed to as_dataarray(), the coords
parameter was silently ignored. This was inconsistent with other input
types (numpy, pandas) where coords are applied.

Now, when coords is provided as a dict and the input is a DataArray,
the function will reindex the array to match the provided coordinates.
This ensures consistent behavior across all input types.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
  2. Expands to new dims from coords (broadcast)

  Summary:
  - as_dataarray now consistently applies coords for all input types
  - DataArrays with fewer dims are expanded to match the full coords specification
  - This fixes the inconsistency when creating variables with DataArray bounds
@FBumann
Copy link
Collaborator Author

FBumann commented Feb 18, 2026

Maybe remove explicit broadcast from masking added in a prior patch

FBumann and others added 10 commits February 19, 2026 07:43
Replace reindex with a strict equality check for DataArray inputs.
Silent reindexing is dangerous as it introduces NaNs for missing
indices and drops unmatched ones, masking user bugs. Now raises
ValueError if coords don't match, while still allowing expand_dims
for broadcasting to new dimensions.
Strict by default: raises ValueError if a DataArray has dimensions
not present in coords. Call sites that need broadcasting (multiply,
dot, add) opt in with allow_extra_dims=True. Structural call sites
like add_variables bounds/mask remain strict.
When coords is a sequence (e.g. from add_variables), convert it to a
dict using dims or Index names so the same validation applies. This
closes the gap where sequence coords were silently ignored for
DataArray inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unifies the sequence-to-dict coords conversion used in
pandas_to_dataarray, numpy_to_dataarray, and the DataArray branch
of as_dataarray into a single helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the coord validation and broadcasting behavior from the
user perspective.
@FBumann FBumann marked this pull request as ready for review February 19, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant