diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
index 6515a2c4c2..fb5487ada4 100644
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -24,7 +24,10 @@ jobs:
           persist-credentials: false
       - uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
       - run: uv sync --group docs
-      - run: uv run mkdocs build
+      # --strict turns warnings into errors, so a docs code block that fails to execute
+      # at build time (e.g. a non-exec python fence disrupting a later exec="true" block)
+      # fails CI instead of merging as a silent warning.
+      - run: uv run mkdocs build --strict
         env:
           DISABLE_MKDOCS_2_WARNING: "true"
           NO_MKDOCS_2_WARNING: "true"
diff --git a/changes/4016.bugfix.md b/changes/4016.bugfix.md
new file mode 100644
index 0000000000..01984110f7
--- /dev/null
+++ b/changes/4016.bugfix.md
@@ -0,0 +1 @@
+Fixed an invalid `zarr.create_array` example in the quick-start documentation (it passed an unsupported `mode` argument) and made the cloud-storage example execute against a mock S3 backend in CI. Added a test ensuring every Python code block in the documentation is either executed or explicitly opted out with a documented reason, so an invalid example can no longer go untested.
diff --git a/docs/contributing.md b/docs/contributing.md
index b9c7aa1aa2..e4906f6db5 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -12,7 +12,7 @@ If you find a bug, please raise a [GitHub issue](https://github.com/zarr-develop
 
 1. A minimal, self-contained snippet of Python code reproducing the problem. You can format the code nicely using markdown, e.g.:
 
-```python
+```python exec="false" reason="illustrative pseudocode with a '# etc.' placeholder, not runnable"
 import zarr
 g = zarr.group()
 # etc.
@@ -225,10 +225,10 @@ hatch --env docs run serve
 
 #### Adding executable code blocks in the documentation
 
-Zarr uses [Markdown Exec](https://pawamoy.github.io/markdown-exec/usage/) to execute code blocks in Markdown files. Add `exec="on"` to a code block header for it to be executed when the docs are built. For example:
+Zarr uses [Markdown Exec](https://pawamoy.github.io/markdown-exec/usage/) to execute code blocks in Markdown files. Add `exec="true"` to a code block header for it to be executed when the docs are built. For example:
 
 ````md
-```python exec="on"
+```python exec="true"
 print("Hello world")
 ```
 ````
@@ -253,6 +253,64 @@ renders as:
 print("Hello world")
 ```
 
+#### Validating code blocks: `exec` vs `test`
+
+Every Python code block in the documentation is checked by a test
+(`tests/test_docs.py`) so that examples cannot quietly rot — the bug that motivated
+this was an example calling `zarr.create_array(..., mode="w")`, an argument that does
+not exist, which went unnoticed because nothing ran it. A block declares *how* it is
+validated using one of two independent attributes:
+
+  - **`exec="true"`** — Markdown Exec runs the block **at docs-build time to render its
+    output** into the page. This is the attribute described above; it is also what the
+    test suite executes. Use it for ordinary examples whose output should appear in the
+    docs.
+  - **`test="true"`** — the block is **run by the test suite only**, *not* at build time.
+    Use this for an example that should be validated but cannot run in the docs-build
+    environment — for example one that needs a GPU or a cloud backend. Markdown Exec
+    leaves a `test="true"` block as a static, syntax-highlighted snippet (it never
+    executes it), while the test suite still runs it (see the marker note below).
+
+A block may carry both (`exec="true" test="true"`), though in practice `exec="true"`
+already implies it is tested, so you rarely need `test="true"` alongside it.
+
+The two attributes are kept separate on purpose: `exec=` controls *build-time rendering*
+and `test=` controls *test-time validation*. Tagging a GPU/cloud example `exec="true"`
+would make `mkdocs build` try to run it on a machine without that infrastructure and fail
+the build; `test="true"` lets it be validated without being built.
+
+##### Opting a block out of validation
+
+A handful of blocks genuinely cannot run and are not executable Python — a REPL
+transcript, a deliberately-incorrect "before" snippet, a `--8<--` file include. Mark
+these explicitly by opening the fence with
+`exec="false" reason="REPL output transcript, not executable source"` (supply a reason
+that fits the block).
+
+`exec="false"` with a non-empty `reason` is an explicit, greppable opt-out. A test
+(`test_no_unvalidated_blocks`) requires **every** Python block to be either `exec="true"`,
+`test="true"`, or `exec="false"` with a reason — so a block can never silently skip
+validation. A bare ` ```python ` fence, or a typo like `exec="on"`, fails that test.
+
+##### Marker-bound blocks (GPU, S3)
+
+A `test="true"` block that needs special infrastructure declares a pytest marker with
+`markers="..."`, which binds it to that infrastructure in the test suite:
+
+  - `markers="gpu"` — run only under `pytest -m gpu` (the GPU CI environment); skipped
+    elsewhere via `importorskip("cupy")`.
+  - `markers="s3"` — run against a mock S3 (moto) backend supplied by a test fixture, so
+    the example can use a bare `s3://…` URL with no test-only connection details on show.
+
+##### Placement of `test="true"` blocks
+
+Because Markdown Exec does not execute a `test="true"` (or `exec="false"`) block, placing
+one *before* an `exec="true"` block on the same page can disrupt the build-time execution
+of that later block. Put `test="true"` blocks **after** all `exec="true"` blocks on the
+page (or on a page where they are the only Python block). The `test_test_only_blocks_come_last`
+test enforces this, and the CI docs build runs with `--strict` so any such breakage fails
+the build rather than passing as a warning.
+
 #### Building documentation without executing code blocks
 
 Sometimes, you may want the documentation to build quicker. You can disable code block execution by commenting out the [markdown-exec plugin](https://github.com/zarr-developers/zarr-python/blob/884a8c91afcc3efe28b3da952be3b85125c453cb/mkdocs.yml#L132) in the mkdocs configuration file. This will make code blocks and cross references render incorrectly (i.e., expect build warnings), but also reduces build time by ~3x. Be sure to undo the commenting out before opening your pull request.
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 27dc8e6045..0bad4f2e34 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -127,18 +127,6 @@ be done in a separate step.
 Zarr supports persistent storage to disk or cloud-compatible backends. While examples above
 utilized a [`zarr.storage.LocalStore`][], a number of other storage options are available.
 
-Zarr integrates seamlessly with cloud object storage such as Amazon S3 and Google Cloud Storage
-using external libraries like [s3fs](https://s3fs.readthedocs.io) or
-[gcsfs](https://gcsfs.readthedocs.io):
-
-```python
-
-import s3fs
-
-z = zarr.create_array("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10), dtype="f4")
-z[:, :] = np.random.random((100, 100))
-```
-
 A single-file store can also be created using the [`zarr.storage.ZipStore`][]:
 
 ```python exec="true" session="quickstart" source="above"
@@ -173,4 +161,18 @@ z = zarr.open_array(store, mode='r')
 print(z[:])
 ```
 
+Zarr also integrates seamlessly with cloud object storage such as Amazon S3 and Google
+Cloud Storage using external libraries like [s3fs](https://s3fs.readthedocs.io) or
+[gcsfs](https://gcsfs.readthedocs.io):
+
+```python test="true" session="s3demo" markers="s3" source="above"
+import zarr
+import numpy as np
+
+z = zarr.create_array(
+    "s3://example-bucket/foo", shape=(100, 100), chunks=(10, 10), dtype="f4"
+)
+z[:, :] = np.random.random((100, 100))
+```
+
 Read more about Zarr's storage options in the [User Guide](user-guide/index.md).
diff --git a/docs/user-guide/arrays.md b/docs/user-guide/arrays.md
index 14122003c0..dd1788b7d2 100644
--- a/docs/user-guide/arrays.md
+++ b/docs/user-guide/arrays.md
@@ -619,7 +619,7 @@ Without the `shards` argument, there would be 10,000 chunks stored as individual
     Because the feature is still stabilizing, it is disabled by default and
     must be explicitly enabled:
 
-    ```python
+    ```python exec="true" session="arrays-rectilinear"
     import zarr
     zarr.config.set({"array.rectilinear_chunks": True})
     ```
diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md
index fc812c1a20..13fcb6f1b6 100644
--- a/docs/user-guide/cli.md
+++ b/docs/user-guide/cli.md
@@ -45,9 +45,13 @@ This will write new `zarr.json` files to `input.zarr`, leaving the existing v2 m
 
 To open the array/group using the new metadata use:
 
-```python
+```python exec="true" session="cli-open" source="above"
 import zarr
-zarr_with_v3_metadata = zarr.open('path/to/input.zarr', zarr_format=3)
+
+# create a small array to open (stands in for the migrated store)
+zarr.create_array("data/cli-demo.zarr", shape=(4, 4), chunks=(2, 2), dtype="i4", overwrite=True)
+
+zarr_with_v3_metadata = zarr.open("data/cli-demo.zarr", zarr_format=3)
 ```
 
 Once you are happy with the conversion, you can run the following to remove the old v2 metadata:
diff --git a/docs/user-guide/data_types.md b/docs/user-guide/data_types.md
index 3e10845979..6f6bb05033 100644
--- a/docs/user-guide/data_types.md
+++ b/docs/user-guide/data_types.md
@@ -360,7 +360,7 @@ print(type(a.dtype))
 
 But if we inspect the metadata for the array, we can see the Zarr data type object:
 
-```python
+```python exec="false" reason="REPL output transcript, not executable source"
 type(a.metadata.data_type)
 <class 'zarr.core.dtype.npy.int.Int64'>
 ```
diff --git a/docs/user-guide/examples/custom_dtype.md b/docs/user-guide/examples/custom_dtype.md
index d6736e25dd..391407b822 100644
--- a/docs/user-guide/examples/custom_dtype.md
+++ b/docs/user-guide/examples/custom_dtype.md
@@ -2,6 +2,6 @@
 
 ## Source Code
 
-```python
+```python exec="false" reason="pymdownx snippet include directive, not python source"
 --8<-- "examples/custom_dtype/custom_dtype.py"
 ```
diff --git a/docs/user-guide/gpu.md b/docs/user-guide/gpu.md
index 6189f39d3d..6c26c3e564 100644
--- a/docs/user-guide/gpu.md
+++ b/docs/user-guide/gpu.md
@@ -16,7 +16,7 @@ Zarr can use GPUs to accelerate your workload by running `zarr.Config.enable_gpu
 [`zarr.config`][] configures Zarr to use GPU memory for the data
 buffers used internally by Zarr via `enable_gpu()`.
 
-```python
+```python test="true" session="gpu-demo" markers="gpu" source="above"
 import zarr
 import cupy as cp
 zarr.config.enable_gpu()
diff --git a/docs/user-guide/performance.md b/docs/user-guide/performance.md
index fa98e9466e..3357913557 100644
--- a/docs/user-guide/performance.md
+++ b/docs/user-guide/performance.md
@@ -204,7 +204,7 @@ determines the maximum number of concurrent I/O operations.
 The default value is 10, which is a conservative value. You may get improved performance by tuning
 the concurrency limit. You can adjust this value based on your specific needs:
 
-```python
+```python exec="true" session="perf-concurrency"
 import zarr
 
 # Set concurrency for the current session
@@ -234,7 +234,7 @@ By default it is `None`, which lets Python choose the pool size (typically
 
 You can set it explicitly when you want more predictable resource usage:
 
-```python
+```python exec="true" session="perf-workers"
 import zarr
 
 zarr.config.set({'threading.max_workers': 8})
@@ -260,7 +260,7 @@ For example, if you're running Dask with 10 threads and Zarr's default concurren
 
 **Recommendation**: When using Dask with many threads, configure Zarr's concurrency settings:
 
-```python
+```python exec="false" reason="requires dask, which is not in the docs test environment"
 import zarr
 import dask.array as da
 
diff --git a/docs/user-guide/v3_migration.md b/docs/user-guide/v3_migration.md
index 21386c1522..1680547d93 100644
--- a/docs/user-guide/v3_migration.md
+++ b/docs/user-guide/v3_migration.md
@@ -39,7 +39,7 @@ the following actions in order:
    - `numcodecs.*` will no longer be available in `zarr.*`. To migrate, import codecs
      directly from `numcodecs`:
 
-     ```python
+     ```python exec="false" reason="intentionally shows the old/incorrect import for contrast"
      from numcodecs import Blosc
      # instead of:
      # from zarr import Blosc
diff --git a/pyproject.toml b/pyproject.toml
index e342e8305c..92312d2630 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -204,6 +204,10 @@ list-env = "pip list"
 template = "test"
 extra-dependencies = [
     "universal_pathlib",
+    # Needed so tests/test_docs.py is collectable under `pytest -m gpu`; otherwise its
+    # module-level importorskip("pytest_examples") skips the whole module and the gpu
+    # docs example is never executed on GPU hardware.
+    "pytest-examples",
 ]
 features = ["gpu"]
 
@@ -277,9 +281,8 @@ readthedocs = "rm -rf $READTHEDOCS_OUTPUT/html && cp -r site $READTHEDOCS_OUTPUT
 [tool.hatch.envs.doctest]
 description = "Test environment for validating executable code blocks in documentation"
 features = ['remote']
-dependency-groups = ['test']
+dependency-groups = ['remote-tests']
 extra-dependencies = [
-    "s3fs>=2023.10.0",
     "pytest-examples",
 ]
 
@@ -446,6 +449,7 @@ filterwarnings = [
 markers = [
     "asyncio: mark test as asyncio test",
     "gpu: mark a test as requiring CuPy and GPU",
+    "s3: mark a test as requiring a (mock) S3 backend via moto",
     "slow_hypothesis: slow hypothesis tests",
 ]
 
diff --git a/tests/conftest.py b/tests/conftest.py
index 3515acace0..3402eb7063 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -531,3 +531,31 @@ def deep_nan_equal(a: object, b: object) -> bool:
     if isinstance(a, Sequence) and isinstance(b, Sequence):
         return all(deep_nan_equal(a[i], b[i]) for i in range(len(a)))
     return nan_equal(a, b)
+
+
+# Shared mock-S3 (moto) backend. A single server is reused across the whole test session by
+# every test that needs S3 -- both the fsspec store tests and the documentation examples --
+# instead of each module standing up its own. Consumers create their own buckets and choose
+# how the endpoint reaches the client (explicit storage_options vs. the AWS_ENDPOINT_URL
+# env var) on top of this fixture.
+MOTO_SERVER_PORT = 5555
+MOTO_ENDPOINT_URL = f"http://127.0.0.1:{MOTO_SERVER_PORT}/"
+
+
+@pytest.fixture(scope="session")
+def moto_server() -> Generator[str, None, None]:
+    """Start a session-scoped moto S3 server and yield its endpoint URL.
+
+    importorskip lives inside the fixture so moto is only required when a test actually
+    requests an S3 backend, not for the whole test session."""
+    moto_server_mod = pytest.importorskip("moto.moto_server.threaded_moto_server")
+
+    server = moto_server_mod.ThreadedMotoServer(ip_address="127.0.0.1", port=MOTO_SERVER_PORT)
+    server.start()
+    # moto needs *some* credentials present; use throwaway values if the environment has none.
+    os.environ.setdefault("AWS_SECRET_ACCESS_KEY", "foo")
+    os.environ.setdefault("AWS_ACCESS_KEY_ID", "foo")
+    try:
+        yield MOTO_ENDPOINT_URL
+    finally:
+        server.stop()
diff --git a/tests/test_docs.py b/tests/test_docs.py
index d467e478e8..02dca225b0 100644
--- a/tests/test_docs.py
+++ b/tests/test_docs.py
@@ -1,95 +1,243 @@
 """
 Tests for executable code blocks in markdown documentation.
 
-This module uses pytest-examples to validate that all Python code examples
-with exec="true" in the documentation execute successfully.
+This module uses pytest-examples to validate Python code examples in the docs. A block is
+validated if it renders output at build (exec="true") or is explicitly marked for testing
+(test="true"). The two flags are separate on purpose: exec= drives markdown-exec's
+build-time rendering, while test= lets a block be validated without being run at build
+(e.g. gpu/s3 examples the build environment cannot run). The test_no_unvalidated_blocks
+guard ensures every python block declares one of those, or an explicit exec="false" opt-out
+with a reason, so a block can never silently skip validation.
 """
 
 from __future__ import annotations
 
+import os
 from collections import defaultdict
 from pathlib import Path
+from typing import TYPE_CHECKING, Any
 
 import pytest
 
 pytest.importorskip("pytest_examples")
 from pytest_examples import CodeExample, EvalExample, find_examples
 
-# Find all markdown files with executable code blocks
+if TYPE_CHECKING:
+    from collections.abc import Generator
+
 DOCS_ROOT = Path(__file__).parent.parent / "docs"
 SOURCES_ROOT = Path(__file__).parent.parent / "src" / "zarr"
 
 
-def find_markdown_files_with_exec() -> list[Path]:
-    """Find all markdown files containing exec="true" code blocks."""
-    markdown_files = []
+def name_example(path: str, session: str) -> str:
+    """Generate a readable name for a test case from file path and session."""
+    file = Path(path)
+    try:
+        file = file.relative_to(DOCS_ROOT)
+    except ValueError:
+        # Path is outside DOCS_ROOT (e.g. a tmp_path fixture in unit tests); use the
+        # bare file name rather than an absolute path for a stable, readable id.
+        file = Path(file.name)
+    return f"{file}:{session}"
 
-    for md_file in DOCS_ROOT.rglob("*.md"):
-        try:
-            content = md_file.read_text(encoding="utf-8")
-            if 'exec="true"' in content:
-                markdown_files.append(md_file)
-        except Exception:
-            # Skip files that can't be read
+
+def _marker_names(settings: dict[str, str]) -> list[str]:
+    """Parse a block's markers="a b" attribute into a list of marker names."""
+    return [name for name in settings.get("markers", "").split() if name]
+
+
+def _is_tested(settings: dict[str, str]) -> bool:
+    """A block is validated by our pytest harness if it is run at build to render output
+    (exec="true") OR explicitly marked for testing (test="true"). The two flags are
+    separate on purpose: exec= drives markdown-exec's build-time rendering, while test=
+    lets a block be validated without being run at build (e.g. gpu/s3 blocks, which the
+    build environment cannot run)."""
+    return settings.get("exec") == "true" or settings.get("test") == "true"
+
+
+def _session_params(root: Path) -> list[Any]:
+    """Group tested examples (exec="true" or test="true") by (file, session) and emit one
+    pytest.param per session, carrying the union of markers declared by that session's
+    blocks."""
+    sessions: defaultdict[tuple[str, str], list[CodeExample]] = defaultdict(list)
+    marks_by_session: defaultdict[tuple[str, str], set[str]] = defaultdict(set)
+
+    for example in find_examples(str(root)):
+        settings = example.prefix_settings()
+        if not _is_tested(settings):
             continue
+        session_name = settings.get("session", "_default")
+        key = (str(example.path), session_name)
+        sessions[key].append(example)
+        marks_by_session[key].update(_marker_names(settings))
 
-    return sorted(markdown_files)
+    params = []
+    for key in sorted(sessions.keys(), key=lambda x: (x[0], x[1])):
+        marks = tuple(getattr(pytest.mark, name) for name in sorted(marks_by_session[key]))
+        params.append(pytest.param(key, marks=marks, id=name_example(key[0], key[1])))
+    return params
 
 
-def group_examples_by_session() -> list[tuple[str, str]]:
-    """
-    Group examples by their session and file, maintaining order.
+S3_BUCKET = "example-bucket"
 
-    Returns a list of session_key tuples where session_key is
-    (file_path, session_name).
-    """
-    all_examples = list(find_examples(DOCS_ROOT))
 
-    # Group by file and session
-    sessions = defaultdict(list)
+@pytest.fixture
+def docs_s3_backend(moto_server: str) -> Generator[None, None, None]:
+    """Point docs S3 examples at the shared moto server (tests/conftest.py) via a
+    process-wide AWS_ENDPOINT_URL, so a block can use a bare s3:// URL with no
+    storage_options (see spike in the design notes). The server lifecycle belongs to the
+    session-scoped `moto_server` fixture; this fixture only adds the docs-specific
+    endpoint env var and a fresh bucket, and restores both on teardown."""
+    s3fs = pytest.importorskip("s3fs")
+    botocore = pytest.importorskip("botocore")
+    requests = pytest.importorskip("requests")
 
-    for example in all_examples:
+    prev_endpoint = os.environ.get("AWS_ENDPOINT_URL")
+    os.environ["AWS_ENDPOINT_URL"] = moto_server
+
+    session = botocore.session.Session()
+    client = session.create_client("s3", endpoint_url=moto_server, region_name="us-east-1")
+    client.create_bucket(Bucket=S3_BUCKET)
+    client.close()
+    s3fs.S3FileSystem.clear_instance_cache()
+    try:
+        yield
+    finally:
+        # Reset moto state and restore AWS_ENDPOINT_URL; the shared server keeps running
+        # (the moto_server fixture stops it at session end).
+        try:
+            requests.post(f"{moto_server}moto-api/reset")
+        finally:
+            if prev_endpoint is None:
+                os.environ.pop("AWS_ENDPOINT_URL", None)
+            else:
+                os.environ["AWS_ENDPOINT_URL"] = prev_endpoint
+
+
+def test_markers_attribute_is_parsed(tmp_path: Path) -> None:
+    """A test="true" block tagged markers="s3" must surface that marker on its
+    parametrized case, so pytest can gate/bind it (e.g. attach the moto fixture).
+    Uses test="true" (not exec="true") because marker-bound blocks are validated by the
+    harness without being run at build time."""
+    md = tmp_path / "ex.md"
+    md.write_text(
+        '```python test="true" session="demo" markers="s3"\nimport zarr\n```\n',
+        encoding="utf-8",
+    )
+    params = _session_params(md.parent)
+    assert len(params) == 1
+    marks = params[0].marks
+    assert any(m.name == "s3" for m in marks)
+
+
+def test_no_unvalidated_blocks() -> None:
+    """Every python code block in docs/ must declare its validation state: exec="true"
+    (run at build to render output), test="true" (validated by this harness without being
+    run at build), or exec="false" with a reason (explicit, documented opt-out). A bare or
+    mistyped fence (e.g. exec="on") fails here, so a block can never silently opt out of
+    validation -- the gap that hid the invalid create_array(mode="w") example in #4016.
+
+    A separate placement constraint is enforced by test_test_only_blocks_come_last."""
+    offenders: list[str] = []
+    for example in find_examples(str(DOCS_ROOT)):
+        rel = Path(example.path).relative_to(DOCS_ROOT)
         settings = example.prefix_settings()
-        if settings.get("exec") != "true":
+        exec_val = settings.get("exec")
+        loc = f"{rel}:{example.start_line}"
+        # Validated either by build-render (exec="true") or by the test harness
+        # (test="true").
+        if _is_tested(settings):
+            continue
+        # Explicit, documented opt-out from execution.
+        if exec_val == "false" and settings.get("reason", "").strip():
             continue
+        offenders.append(
+            f"{loc} (exec={exec_val!r}, test={settings.get('test')!r}, "
+            f"reason={settings.get('reason')!r})"
+        )
 
-        # Use file path and session name as key
-        file_path = example.path
-        session_name = settings.get("session", "_default")
-        session_key = (str(file_path), session_name)
+    assert not offenders, (
+        'Docs python blocks must be exec="true", test="true", or exec="false" with a '
+        "reason:\n" + "\n".join(offenders)
+    )
 
-        sessions[session_key].append(example)
 
-    # Return sorted list of session keys for consistent test ordering
-    return sorted(sessions.keys(), key=lambda x: (x[0], x[1]))
+def test_test_only_blocks_come_last() -> None:
+    """A conservative placement convention: a test="true"-only block must come after every
+    exec="true" block in the same file.
 
+    Mechanism (established by experiment + markdown-exec's SuperFences integration): a
+    python fence that markdown-exec does not execute -- i.e. one lacking exec="true",
+    whether test="true" or exec="false" -- placed before an exec="true" block disrupts
+    markdown-exec's build-time execution of a *later, state-dependent* block. Observed: a
+    non-exec python fence inserted before the quickstart ZipStore write/read pair made the
+    read block fail with FileNotFoundError (the write never took effect), aborting
+    `mkdocs build --strict`. The effect needs a cross-block dependency to surface, so it
+    does not affect the standalone exec="true" blocks in e.g. data_types.md/performance.md
+    that already have exec="false" opt-out blocks above them.
 
-def name_example(path: str, session: str) -> str:
-    """Generate a readable name for a test case from file path and session."""
-    return f"{Path(path).relative_to(DOCS_ROOT)}:{session}"
+    Because we cannot statically tell which later blocks are state-dependent, this guard
+    enforces the simple, safe convention only for the blocks we author this way
+    (test="true" marker-bound examples like s3/gpu). It is NOT a complete build-hazard
+    check -- the authoritative check is `mkdocs build --strict` (the docs:check CI job),
+    which catches the exec="false" case too. This guard just turns the common test-only
+    case into a fast, local failure."""
+    # Collect, per published-docs file, the start lines of test-only and exec blocks.
+    test_only: defaultdict[str, list[int]] = defaultdict(list)
+    exec_lines: defaultdict[str, list[int]] = defaultdict(list)
+    for example in find_examples(str(DOCS_ROOT)):
+        settings = example.prefix_settings()
+        path = str(example.path)
+        if settings.get("exec") == "true":
+            exec_lines[path].append(example.start_line)
+        elif settings.get("test") == "true":
+            test_only[path].append(example.start_line)
+
+    offenders: list[str] = []
+    for path, only_lines in test_only.items():
+        rel = Path(path).relative_to(DOCS_ROOT)
+        last_exec = max(exec_lines.get(path, [0]))
+        offenders.extend(
+            f'{rel}:{line} (test="true" block precedes an exec="true" block at line {last_exec})'
+            for line in only_lines
+            if line < last_exec
+        )
+
+    assert not offenders, (
+        'A test="true"-only block must come after every exec="true" block in the same '
+        'file: a non-executed python fence before an exec="true" block can disrupt '
+        "markdown-exec's build-time execution of a later state-dependent block (see this "
+        "test's docstring):\n" + "\n".join(offenders)
+    )
 
 
 # Get all example sessions
-@pytest.mark.parametrize(
-    "session_key", group_examples_by_session(), ids=lambda v: name_example(v[0], v[1])
-)
+@pytest.mark.parametrize("session_key", _session_params(DOCS_ROOT))
 def test_documentation_examples(
     session_key: tuple[str, str],
     eval_example: EvalExample,
+    request: pytest.FixtureRequest,
 ) -> None:
     """
-    Test that all exec="true" code examples in documentation execute successfully.
+    Test that all validated code examples (exec="true" or test="true") in documentation
+    execute successfully.
 
     This test groups examples by session (file + session name) and runs them
     sequentially in the same execution context, allowing code to build on
     previous examples.
 
     This test uses pytest-examples to:
-    - Find all code examples with exec="true" in markdown files
+    - Find all code examples marked exec="true" or test="true" in markdown files
     - Group them by session
     - Execute them in order within the same context
     - Verify no exceptions are raised
     """
+    if request.node.get_closest_marker("gpu") is not None:
+        pytest.importorskip("cupy")
+
+    if request.node.get_closest_marker("s3") is not None:
+        request.getfixturevalue("docs_s3_backend")
+
     file_path, session_name = session_key
 
     # Get examples for this session
@@ -97,7 +245,7 @@ def test_documentation_examples(
     examples = []
     for example in all_examples:
         settings = example.prefix_settings()
-        if settings.get("exec") != "true":
+        if not _is_tested(settings):
             continue
         if str(example.path) == file_path and settings.get("session", "_default") == session_name:
             examples.append(example)
diff --git a/tests/test_store/test_fsspec.py b/tests/test_store/test_fsspec.py
index 8006470174..9efee40d7a 100644
--- a/tests/test_store/test_fsspec.py
+++ b/tests/test_store/test_fsspec.py
@@ -1,7 +1,6 @@
 from __future__ import annotations
 
 import json
-import os
 import re
 from typing import TYPE_CHECKING, Any
 
@@ -10,6 +9,7 @@
 from packaging.version import parse as parse_version
 
 import zarr.api.asynchronous
+from tests.conftest import MOTO_ENDPOINT_URL
 from zarr import Array
 from zarr.abc.store import OffsetByteRequest
 from zarr.core.buffer import Buffer, cpu, default_buffer_prototype
@@ -52,31 +52,23 @@
 fsspec = pytest.importorskip("fsspec")
 s3fs = pytest.importorskip("s3fs")
 requests = pytest.importorskip("requests")
-moto_server = pytest.importorskip("moto.moto_server.threaded_moto_server")
-moto = pytest.importorskip("moto")
+# Skip this module entirely when moto is absent; the server itself comes from the shared
+# `moto_server` fixture in tests/conftest.py.
+pytest.importorskip("moto")
 botocore = pytest.importorskip("botocore")
 
 # ### amended from s3fs ### #
 test_bucket_name = "test"
 secure_bucket_name = "test-secure"
-port = 5555
-endpoint_url = f"http://127.0.0.1:{port}/"
+# The moto server itself is the session-scoped `moto_server` fixture in tests/conftest.py;
+# this module reuses its endpoint rather than standing up its own server.
+endpoint_url = MOTO_ENDPOINT_URL
 
 
-@pytest.fixture(scope="module")
-def s3_base() -> Generator[None, None, None]:
-    # writable local S3 system
-
-    # This fixture is module-scoped, meaning that we can reuse the MotoServer across all tests
-    server = moto_server.ThreadedMotoServer(ip_address="127.0.0.1", port=port)
-    server.start()
-    if "AWS_SECRET_ACCESS_KEY" not in os.environ:
-        os.environ["AWS_SECRET_ACCESS_KEY"] = "foo"
-    if "AWS_ACCESS_KEY_ID" not in os.environ:
-        os.environ["AWS_ACCESS_KEY_ID"] = "foo"
-
-    yield
-    server.stop()
+@pytest.fixture
+def s3_base(moto_server: str) -> str:
+    """Reuse the shared session-scoped moto server (see tests/conftest.py)."""
+    return moto_server
 
 
 def get_boto3_client() -> botocore.client.BaseClient: