feat(#1423): Diagram.trace() for upstream restriction propagation#1471
Conversation
Implements T2.2.a of the provenance trinity (datajoint-docs#183). Upstream mirror of Diagram.cascade(): walks the FK graph from a restricted seed to every ancestor with OR convergence — an ancestor entity is included if reachable through any FK path from the seed. Reuses the upward propagation primitives added by #1468 (_apply_propagation_rule_upward / _find_real_edge_props) applied here in a generalized form (any child → any parent, not just Part → Master). Branch note: stacked on fix/1429-cascade-part-part-renamed-fk (#1468) for the upward primitives. Will rebase onto master after #1468 lands. What's added: - src/datajoint/diagram.py: - New @classmethod Diagram.trace(table_expr) — mirror of cascade(), walks ancestors instead of descendants, trims to ancestor subgraph. - New _propagate_restrictions_upstream(start_node) — multi-pass walk over in_edges, applies the upward rules at each real edge. Alias-node transparent. - New __getitem__(key) — supports both Table subclass/instance (returns pre-restricted QueryExpression) and string (returns pre-restricted FreeTable). Raises DataJointError for tables outside the trace's subgraph. - Bugfix in _apply_propagation_rule_upward Backward Rule 3: previous code projected child to its OWN PK (child_ft.proj()) which excluded non-primary FK columns. Now projects to the FK columns via proj(*attr_map.keys()), correctly carrying them into the parent restriction for non-primary-FK cases. Caught by test_trace_or_convergence_two_paths. - src/datajoint/dependencies.py: - New load_all_upstream() — symmetric to load_all_downstream. Iteratively discovers upstream schemas reachable via reverse FK edges, expanding the graph until convergence. - src/datajoint/adapters/{base,mysql,postgres}.py: - New find_upstream_schemas_sql(schemas_list) on each adapter, symmetric to find_downstream_schemas_sql. - tests/integration/test_trace.py (new, 8 tests covering single-hop, multi-hop, renamed FK, OR convergence across two paths, non-ancestor rejection, string indexing → FreeTable, counts(), leaf-table seed). All 8 trace tests pass on MySQL. Regression: test_cascade_delete + test_cascading_delete + test_dependencies + test_semantic_matching — 36 tests pass, no regressions from the Rule 3 fix. Slated for DataJoint 2.3.
The trace-mode __getitem__ I added shadowed networkx.DiGraph's standard
adjacency-dict lookup for ALL Diagrams, not just trace results. ERD
tests (and any other code that does diagram[node_name] for adjacency)
were getting DataJointError("not in this trace's subgraph") instead of
the adjacency dict.
Fix: short-circuit non-trace diagrams (no _mode attribute or _mode != "trace")
to super().__getitem__(key) before any trace-specific logic runs.
Tests:
- 5 previously-failing erd tests now pass (test_erd, test_diagram_algebra,
test_repr_svg, test_make_image, test_part_table_parsing).
- 8/8 trace tests still pass.
Implements T2.2.c of the provenance trinity, completing the trio (Diagram.trace → self.upstream → strict_provenance). When dj.config["strict_provenance"] = True, runtime gates enforce the upstream-only convention inside make(): - Reads must target a table in the active trace's allowed set (declared ancestors + self + self's Parts). - Writes must target self or self's Parts. - Inserted rows' PK columns that overlap with the current key must equal the key's values (key-consistency rule). Default is False. Existing make() bodies are unaffected. Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace (#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase onto master after the chain merges. What's added: - src/datajoint/provenance.py (new): the runtime context module. - `_active_strict_make` ContextVar holding (target, allowed_tables, key) for the currently-executing make() invocation. ContextVar chosen over threading.local to propagate correctly across contextvars-aware concurrency boundaries. - `push_strict_make_context` / `pop_strict_make_context` — context lifecycle managed by `_populate_one`'s try/finally. - `assert_read_allowed(query_expression)` — read gate. Recursively discovers base tables via the QueryExpression's `_support` chain and checks each against the allowed set. - `assert_write_allowed(target_table, rows)` — write gate. Verifies the target is self or one of self's Part tables, and checks the key-consistency rule on each dict row. - src/datajoint/settings.py: new `strict_provenance: bool` field on Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING entry. - src/datajoint/autopopulate.py: in `_populate_one`, push the strict context (when the flag is on) just before the make() invocation block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name} ∪ {self's Parts}. Pop in the existing `finally` block. - src/datajoint/expression.py: `QueryExpression.cursor` now calls `assert_read_allowed(self)` before issuing SQL. No-op outside make(). - src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)` after the existing `_allow_insert` check. No-op outside make(). Part-table detection uses class `__dict__` traversal (filtered to Part subclasses) instead of `dir/getattr` to avoid triggering the `_JobsDescriptor` (which would lazy-declare ~~table inside the populate transaction — caught by the first test iteration). Documented limitation (deferred): the read gate does not distinguish reads that came through `self.upstream` from reads of the same ancestor via a direct expression. Both are allowed if the table is in the allowed set. The intent is to catch reads from *undeclared* dependencies; tightening the "must come through self.upstream" path requires propagating an attribution marker through QueryExpression composition and is left for a follow-up release. Tests in tests/integration/test_strict_provenance.py (6 new): - test_strict_compliant_make_passes — make() reading via self.upstream and writing self.insert1 with matching key runs cleanly under strict. - test_strict_blocks_read_from_undeclared_table — read from an unrelated table raises with "strict_provenance ... undeclared" message. - test_strict_blocks_write_to_other_table — insert into a non-self, non-Part target raises "not permitted". - test_strict_blocks_write_with_mismatched_key — row PK that disagrees with the current key raises "does not match the current make() key". - test_strict_writes_to_part_table_pass — self.PartName.insert(...) works. - test_strict_off_by_default_no_change — default-off regression check; the canonical "direct (Ancestor & key).fetch1()" pattern still works when strict_provenance is unset. Regression: 17/17 autopopulate tests pass with strict_provenance unset (default). 6/6 new strict tests pass with strict_provenance=True. 8/8 trace tests + 9/9 cascade tests unaffected. Slated for DataJoint 2.3.
|
I traced the core logic against the forward One thing I'd add before merge: coverage on the cases that most stress the design's novelty. A regression in the reverse-topo walk or
Not blocking — the logic is sound — but for the foundation of the trinity I think these are worth having. |
MilagrosMarin
left a comment
There was a problem hiding this comment.
Reviewed 03dd0a7b on top of the already-approved #1468. All four coverage gaps @ttngu207 flagged still hold on the current tip.
Verified ✅
- Structure.
_propagate_restrictions_upstream(diagram.py:468) walksin_edgesin reverse topo order (children before parents), applies_apply_propagation_rule_upwardat each real edge, transparently steps through alias nodes. Multi-pass loop terminates on DAG. - OR-convergence. Multiple FK paths append to
_cascade_restrictions[parent];_restricted_tableOR-combines the list viaft.restrict(list)→ OrList. Same storage shape as cascade; OR semantics fall out of the reuse. - Rule 3 fix (
child_ft.proj()→child_ft.proj(*attr_map.keys())atdiagram.py:896). Old code projected to child's own PK, so non-primary FK columns dropped out and the parent's natural-join found no shared cols — restriction silently lost. New code carries FK columns. Cascade regression (36/36) confirms unaffected: Rule 3 only fires whenchild_attrs ⊄ parent_pk, which is rare in cascade's Part-to-Master walks. __getitem__gate on_mode == "trace". Non-trace diagrams fall through to networkx adjacency lookup; existing patterns keep working.load_all_upstream. Structural mirror ofload_all_downstream. Both adapter SQLs reverse the FK-schema query correctly.
Coverage gaps (concur with @ttngu207)
Verified by inspection of tests/integration/test_trace.py:
- Multi-hop diamond OR-convergence — the OR test uses two adjacent-edge paths; a
Leaf → {Middle1, Middle2} → Rootshape would exercise multi-pass accumulation. - Cross-schema ancestors — no test spans two schemas;
load_all_upstream's discovery path is unexercised. - Upward Part-of-Part chains —
Partdoesn't appear in the file. - Direct Rule 3 hit test — only exercised inside the mixed OR test; a single-hop non-primary-FK trace with an explicit assertion would guard the exact code path this PR touches.
Not blocking, but worth having before the trinity's later pieces (self.upstream, strict_provenance) build on trace — a subset-shaped regression here would silently loosen provenance downstream.
Small observations
- Docstring drift. Rule 3's numbered docstring (
diagram.py:866) still sayschild.proj()— text should mirror the fixed code. - Alias transparency reimplemented inline in
_propagate_restrictions_upstream(lines 508–530) rather than reusing_find_real_edge_propsatdiagram.py:1007. Small refactor opportunity for future maintenance.
Bottom line
Substantive logic sound. Multi-hop diamond and cross-schema tests would be worth adding before merge; the other two coverage items and the observations can be follow-ups. Approving with the same non-blocking stance as @ttngu207.
…cstring Addresses the before-merge coverage @ttngu207 and @MilagrosMarin flagged on #1471 (the two they singled out as worth adding before merge): - test_trace_multi_hop_diamond_or_convergence: an ancestor reached via two MULTI-HOP arms (Leaf -> {Left, Right} -> Root), asserting the OR-union {1, 2}. Guards the multi-pass accumulation in the reverse-topo walk, where a regression would silently drop an arm and yield a subset. - test_trace_cross_schema_ancestor: seed and ancestor in different schemas, exercising load_all_upstream's unloaded-ancestor-schema discovery. Runs on both MySQL and PostgreSQL via schema_by_backend. Also fixes the Backward Rule 3 docstring drift (diagram.py): it described child.proj() but the code projects child.proj(*attr_map.keys()) to carry the FK columns. Upward Part-of-Part chains and an isolated non-aliased secondary-FK test remain as noted follow-ups.
2d81534
MilagrosMarin
left a comment
There was a problem hiding this comment.
2d81534a lands both coverage gaps we flagged before-merge (multi-hop diamond, cross-schema ancestor) plus the Rule 3 docstring fix. The diamond uses a renamed root_id2 on Right to force real multi-hop OR reconvergence, and excluded Root=3 proves it. Part-of-Part upward + isolated non-primary-FK explicitly deferred.
Re-approving.
…verage Addresses the review feedback from @ttngu207 and @MilagrosMarin on #1473: 1. test_upstream_cleared_after_make now captures the actual populate instance inside make() and asserts its _upstream is cleared afterward. The old test probed a fresh Derived().upstream (never set -> class default None), so it passed even if the finally-reset were deleted. Now it has teeth. 2. test_upstream_cleared_after_make_raises (new): forces make() to raise and asserts the populate instance's upstream is still cleared -- exercising the exception path the finally block exists for (previously zero coverage). 3. test_upstream_seen_across_tripartite_make now asserts, via id(self._upstream) grouped per key, that all three phases share one upstream object -- proving the docstring's 'constructed once, shared across phases' claim instead of only checking the result. 4. test_upstream_rejects_non_ancestor now exercises both the class-form and string-form Diagram.__getitem__ branches. Validated locally on MySQL and PostgreSQL (full test_autopopulate.py: 18 passed, 2 skipped). Upward Part-of-Part and isolated-secondary-FK coverage remain follow-ups (the diamond-OR and cross-schema gaps were closed at the trace layer in #1471).
Summary
T2.2.a of the provenance trinity (datajoint-docs#183 spec). Upstream mirror of `Diagram.cascade()`: walks the FK graph from a restricted seed to every ancestor with OR convergence — an ancestor entity is included if reachable through any FK path from the seed.
Closes #1423. Slated for DataJoint 2.3.
Branch dependency
This branch is stacked on `fix/1429-cascade-part-part-renamed-fk` (#1468) for the upward propagation primitives (`_apply_propagation_rule_upward`, `_find_real_edge_props`). After #1468 merges, this branch will be rebased onto master before review.
What's added
Tests
Sequencing
This is T2.2.a. After it merges:
Test plan