Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
986cd28
Add glob pattern utility functions for group filtering
aladinor Apr 16, 2026
6d29b6d
Add glob pattern filtering to h5netcdf backend
aladinor Apr 16, 2026
6fe80b3
Add glob pattern filtering to netCDF4 backend
aladinor Apr 16, 2026
7a3e3bc
Add glob pattern filtering to zarr backend
aladinor Apr 16, 2026
16f9e12
Document glob pattern support in open_datatree and open_groups
aladinor Apr 16, 2026
c17f134
Add whats-new entry for glob pattern group filtering
aladinor Apr 16, 2026
5fb46e1
Add tests for glob pattern group filtering
aladinor Apr 16, 2026
c030089
Simplify _filter_group_paths with set.update
aladinor Apr 16, 2026
cd5485d
Test glob-metachar escaping via character classes
aladinor Apr 22, 2026
0340696
Document glob-metachar escaping in open_datatree/open_groups docstrings
aladinor Apr 22, 2026
e2cf518
Align zarr escape test with NetCDF parity (add plain_01 case)
aladinor Apr 22, 2026
612842d
Skip zarr escape test on Windows (filesystem rejects * and ? in names)
aladinor Apr 22, 2026
7c7e9c0
Use MemoryStore for zarr escape test instead of skipping on Windows
aladinor Apr 22, 2026
a1dd28f
Add type: ignore for MemoryStore in open_datatree calls
aladinor Apr 22, 2026
5eb4cc8
Merge branch 'main' into glob-group-filtering-standalone
aladinor Apr 22, 2026
ff2aa61
Merge branch 'main' into glob-group-filtering-standalone
kmuehlbauer May 8, 2026
c43d614
Replace _resolve_group_and_filter with single-purpose helpers
aladinor May 28, 2026
5303c7a
Document group_filter kwarg in open_datatree and open_groups
aladinor May 28, 2026
3fe28bc
Add group_filter kwarg to h5netcdf open_groups_as_dict
aladinor May 28, 2026
161ba8d
Add group_filter kwarg to netCDF4 open_groups_as_dict
aladinor May 28, 2026
b33bf4c
Add group_filter kwarg to zarr open_datatree and open_groups_as_dict
aladinor May 28, 2026
3b07907
Migrate backend datatree tests to group_filter kwarg
aladinor May 28, 2026
a1d33dc
Merge remote-tracking branch 'upstream/main' into glob-group-filterin…
aladinor May 28, 2026
c633de2
docs: describe group_filter kwarg in whats-new
aladinor May 28, 2026
22a2405
h5netcdf: explicit group_filter kwarg; drop dead ternary
aladinor May 28, 2026
1921470
netCDF4: explicit group_filter kwarg; drop dead ternary
aladinor May 28, 2026
1287d47
zarr: push group_filter into ZarrStore.open_store
aladinor May 28, 2026
271e411
docs(api): correct group_filter docstring on open_datatree/open_groups
aladinor May 28, 2026
956cf3a
common: reject empty group_filter; tighten Sequence type
aladinor May 28, 2026
765e62a
tests(datatree): tighten assertions; add edge + zarr-parity tests
aladinor May 28, 2026
4da335f
docs(whats-new): switch group_filter example from sweep_0 to leaf_0
aladinor May 28, 2026
920903f
tests: drop /leaf_0 from right-anchored fixture (py3.11/3.13 skew)
aladinor May 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,14 @@ New Features
or a fixed ``(width, height)`` tuple instead of computing figure size from
``size`` and ``aspect`` (:issue:`11103`).
By `Kristian Kollsga <https://github.com/kkollsga>`_.
- Added ``group_filter`` keyword to :py:func:`open_datatree` and
:py:func:`open_groups`, accepting an ``fnmatch``-style glob pattern
(e.g. ``"*/leaf_0"``) to selectively open matching groups. Mutually
exclusive with ``group``, which keeps its exact-path semantics.
Groups whose names literally contain ``*`` or ``?`` are reachable via
character-class escapes (``[*]``, ``[?]``)
(:issue:`11196`, :pull:`11302`).
By `Alfonso Ladino <https://github.com/aladinor>`_.

Breaking Changes
~~~~~~~~~~~~~~~~
Expand Down
34 changes: 30 additions & 4 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1021,8 +1021,21 @@ def open_datatree(
Additional keyword arguments passed on to the engine open function.
For example:

- 'group': path to the group in the given file to open as the root group as
a str.
- 'group': path to the group in the given file to open as the root
group as a str. Mutually exclusive with ``'group_filter'``.
- 'group_filter': non-empty glob pattern matched against every
group path in the file. Only groups whose paths match the
pattern are loaded, along with their ancestors so the resulting
tree stays connected. Matching follows
:py:meth:`pathlib.PurePath.match` semantics: the pattern is
anchored on the right, so ``group_filter="*/leaf_0"`` matches
any group whose path ends in ``<segment>/leaf_0`` at any depth.
Group names that contain literal glob metacharacters can be
targeted with character-class escapes: ``[*]`` matches a literal
``*``, ``[?]`` a literal ``?``, and ``[[]`` a literal ``[``. For
example, ``group_filter="group_[*]_01"`` matches a group
literally named ``group_*_01``. Mutually exclusive with
``'group'``.
- 'lock': resource lock to use when reading data from disk. Only
relevant when using dask or another form of parallelism. By default,
appropriate locks are chosen to safely read and write files with the
Expand Down Expand Up @@ -1265,8 +1278,21 @@ def open_groups(
Additional keyword arguments passed on to the engine open function.
For example:

- 'group': path to the group in the given file to open as the root group as
a str.
- 'group': path to the group in the given file to open as the root
group as a str. Mutually exclusive with ``'group_filter'``.
- 'group_filter': non-empty glob pattern matched against every
group path in the file. Only groups whose paths match the
pattern are loaded, along with their ancestors so the resulting
tree stays connected. Matching follows
:py:meth:`pathlib.PurePath.match` semantics: the pattern is
anchored on the right, so ``group_filter="*/leaf_0"`` matches
any group whose path ends in ``<segment>/leaf_0`` at any depth.
Group names that contain literal glob metacharacters can be
targeted with character-class escapes: ``[*]`` matches a literal
``*``, ``[?]`` a literal ``?``, and ``[[]`` a literal ``[``. For
example, ``group_filter="group_[*]_01"`` matches a group
literally named ``group_*_01``. Mutually exclusive with
``'group'``.
- 'lock': resource lock to use when reading data from disk. Only
relevant when using dask or another form of parallelism. By default,
appropriate locks are chosen to safely read and write files with the
Expand Down
35 changes: 35 additions & 0 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,41 @@ def _iter_nc_groups(root, parent="/"):
yield from _iter_nc_groups(group, parent=gpath)


def _check_group_filter_mutex(group: str | None, group_filter: str | None) -> None:
"""Validate ``group`` / ``group_filter`` are not both set, and ``group_filter``
is non-empty when provided.
"""
if group is not None and group_filter is not None:
raise ValueError(
"group and group_filter are mutually exclusive: "
"group selects an exact group path while group_filter "
"is a glob pattern over all group paths."
)
if group_filter == "":
raise ValueError("group_filter must be a non-empty glob pattern")


def _filter_group_paths(group_paths: Sequence[str], pattern: str) -> list[str]:
"""Return the subset of ``group_paths`` whose paths match ``pattern``,
plus every ancestor of a match (so the resulting tree stays
connected). The root path ``"/"`` is always included.

``pattern`` is matched with :py:meth:`pathlib.PurePath.match` semantics,
so it is anchored on the right: ``"*/leaf_0"`` matches a group whose
path ends in any single segment followed by ``leaf_0`` at any depth.
"""
from xarray.core.treenode import NodePath

matched: set[str] = {"/"}
for path in group_paths:
np_ = NodePath(path)
if np_.match(pattern):
matched.add(path)
matched.update(str(p) for p in np_.parents)

return [p for p in group_paths if p in matched]


def find_root_and_group(ds):
"""Find the root and group name of a netCDF4/h5netcdf dataset."""
hierarchy = ()
Expand Down
23 changes: 18 additions & 5 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ def open_datatree(
decode_timedelta=None,
format="NETCDF4",
group: str | None = None,
group_filter: str | None = None,
lock=None,
invalid_netcdf=None,
phony_dims=None,
Expand All @@ -621,6 +622,7 @@ def open_datatree(
decode_timedelta=decode_timedelta,
format=format,
group=group,
group_filter=group_filter,
lock=lock,
invalid_netcdf=invalid_netcdf,
phony_dims=phony_dims,
Expand All @@ -645,6 +647,7 @@ def open_groups_as_dict(
decode_timedelta=None,
format="NETCDF4",
group: str | None = None,
group_filter: str | None = None,
lock=None,
invalid_netcdf=None,
phony_dims=None,
Expand All @@ -655,15 +658,22 @@ def open_groups_as_dict(
open_kwargs: dict[str, Any] | None = None,
**kwargs,
) -> dict[str, Dataset]:
from xarray.backends.common import _iter_nc_groups
from xarray.backends.common import (
_check_group_filter_mutex,
_filter_group_paths,
_iter_nc_groups,
)
from xarray.core.treenode import NodePath
from xarray.core.utils import close_on_error

_check_group_filter_mutex(group, group_filter)

# Keep this message for some versions
# remove and set phony_dims="access" above
emit_phony_dims_warning, phony_dims = _check_phony_dims(phony_dims)

filename_or_obj = _normalize_filename_or_obj(filename_or_obj)

store = H5NetCDFStore.open(
filename_or_obj,
format=format,
Expand All @@ -678,15 +688,18 @@ def open_groups_as_dict(
open_kwargs=open_kwargs,
)

# Check for a group and make it a parent if it exists
if group:
if group is not None:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager
group_paths = list(_iter_nc_groups(store.ds, parent=parent))
if group_filter is not None:
group_paths = _filter_group_paths(group_paths, group_filter)

groups_dict = {}
for path_group in _iter_nc_groups(store.ds, parent=parent):
for path_group in group_paths:
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
Expand All @@ -701,7 +714,7 @@ def open_groups_as_dict(
decode_timedelta=decode_timedelta,
)

if group:
if group is not None:
group_name = str(NodePath(path_group).relative_to(parent))
else:
group_name = str(NodePath(path_group))
Expand Down
23 changes: 18 additions & 5 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -807,6 +807,7 @@ def open_datatree(
use_cftime=None,
decode_timedelta=None,
group: str | None = None,
group_filter: str | None = None,
format="NETCDF4",
clobber=True,
diskless=False,
Expand All @@ -826,6 +827,7 @@ def open_datatree(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
group=group,
group_filter=group_filter,
format=format,
clobber=clobber,
diskless=diskless,
Expand All @@ -850,6 +852,7 @@ def open_groups_as_dict(
use_cftime=None,
decode_timedelta=None,
group: str | None = None,
group_filter: str | None = None,
format="NETCDF4",
clobber=True,
diskless=False,
Expand All @@ -859,10 +862,17 @@ def open_groups_as_dict(
autoclose=False,
**kwargs,
) -> dict[str, Dataset]:
from xarray.backends.common import _iter_nc_groups
from xarray.backends.common import (
_check_group_filter_mutex,
_filter_group_paths,
_iter_nc_groups,
)
from xarray.core.treenode import NodePath

_check_group_filter_mutex(group, group_filter)

filename_or_obj = _normalize_path(filename_or_obj)

store = NetCDF4DataStore.open(
filename_or_obj,
group=group,
Expand All @@ -875,15 +885,18 @@ def open_groups_as_dict(
autoclose=autoclose,
)

# Check for a group and make it a parent if it exists
if group:
if group is not None:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager
group_paths = list(_iter_nc_groups(store.ds, parent=parent))
if group_filter is not None:
group_paths = _filter_group_paths(group_paths, group_filter)

groups_dict = {}
for path_group in _iter_nc_groups(store.ds, parent=parent):
for path_group in group_paths:
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
Expand All @@ -897,7 +910,7 @@ def open_groups_as_dict(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
if group:
if group is not None:
group_name = str(NodePath(path_group).relative_to(parent))
else:
group_name = str(NodePath(path_group))
Expand Down
26 changes: 22 additions & 4 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -680,6 +680,7 @@ def open_store(
mode: ZarrWriteModes = "r",
synchronizer=None,
group=None,
group_filter: str | None = None,
consolidated=False,
consolidate_on_close=False,
chunk_store=None,
Expand Down Expand Up @@ -715,8 +716,15 @@ def open_store(

from zarr import Group

from xarray.backends.common import _filter_group_paths

group_members: dict[str, Group] = {}
group_paths = list(_iter_zarr_groups(zarr_group, parent=group))
# Filter before materializing child Group objects: each
# ``zarr_group[rel_path]`` lookup triggers metadata I/O, so
# pruning paths up-front skips the cost for groups we'd discard.
if group_filter is not None:
group_paths = _filter_group_paths(group_paths, group_filter)
for path in group_paths:
if path == group:
group_members[path] = zarr_group
Expand Down Expand Up @@ -1779,6 +1787,7 @@ def open_datatree(
use_cftime=None,
decode_timedelta=None,
group: str | None = None,
group_filter: str | None = None,
mode="r",
synchronizer=None,
consolidated=None,
Expand All @@ -1798,6 +1807,7 @@ def open_datatree(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
group=group,
group_filter=group_filter,
mode=mode,
synchronizer=synchronizer,
consolidated=consolidated,
Expand All @@ -1821,6 +1831,7 @@ def open_groups_as_dict(
use_cftime=None,
decode_timedelta=None,
group: str | None = None,
group_filter: str | None = None,
mode="r",
synchronizer=None,
consolidated=None,
Expand All @@ -1829,17 +1840,21 @@ def open_groups_as_dict(
zarr_version=None,
zarr_format=None,
) -> dict[str, Dataset]:
from xarray.backends.common import _check_group_filter_mutex

_check_group_filter_mutex(group, group_filter)

filename_or_obj = _normalize_path(filename_or_obj)

# Check for a group and make it a parent if it exists
if group:
if group is not None:
parent = str(NodePath("/") / NodePath(group))
else:
parent = str(NodePath("/"))

stores = ZarrStore.open_store(
filename_or_obj,
group=parent,
group_filter=group_filter,
mode=mode,
synchronizer=synchronizer,
consolidated=consolidated,
Expand All @@ -1850,8 +1865,11 @@ def open_groups_as_dict(
zarr_format=zarr_format,
)

group_paths = list(stores.keys())

groups_dict = {}
for path_group, store in stores.items():
for path_group in group_paths:
store = stores[path_group]
store_entrypoint = StoreBackendEntrypoint()

with close_on_error(store):
Expand All @@ -1865,7 +1883,7 @@ def open_groups_as_dict(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
if group:
if group is not None:
group_name = str(NodePath(path_group).relative_to(parent))
else:
group_name = str(NodePath(path_group))
Expand Down
Loading
Loading