Implement graph-driven cascade delete and restrict on Diagram#1407
Implement graph-driven cascade delete and restrict on Diagram#1407dimitri-yatsenko merged 50 commits intomasterfrom
Conversation
- Unrestricted nodes are not affected by operations - Multiple restrict() calls create separate restriction sets - Delete combines sets with OR (any taint → delete) - Export combines sets with AND (all criteria → include) - Within a set, multiple FK paths combine with OR (structural) - Added open questions on lenient vs strict AND and same-table restrictions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Delete: one restriction, propagated downstream only, OR at convergence - Export: downstream + upstream context, AND at convergence - Removed over-engineered "multiple restriction sets" abstraction - Clarified alias nodes (same parent, multiple FKs) vs convergence (different parents) - Non-downstream tables: excluded for delete, included for export Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- cascade(): OR at convergence, downstream only — for delete - restrict(): AND at convergence, includes upstream context — for export - Both propagate downstream via attr_map, differ only at convergence - Table.delete() internally constructs diagram.cascade() - part_integrity is a parameter of cascade(), not delete() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Table.drop() rewritten as Diagram(table).drop() - Shared infrastructure: reverse topo traversal, part_integrity pre-checks, unloaded-schema error handling, preview - drop is DDL (no restrictions), delete is DML (with cascade restrictions) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the error-driven cascade in Table.delete() (~200 lines) with graph-driven restriction propagation on Diagram. Table.delete() and Table.drop() now delegate to Diagram.cascade().delete() and Diagram.drop() respectively. New Diagram methods: - cascade(table_expr) — OR at convergence, one-shot, for delete - restrict(table_expr) — AND at convergence, chainable, for export - delete() — execute cascade delete in reverse topo order - drop() — drop tables in reverse topo order - preview() — show affected tables and row counts - _from_table() — lightweight factory for Table.delete/drop Restructure: single Diagram(nx.DiGraph) class always defined. Only visualization methods gated on diagram_active. Resolves #865, #1110. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in diagram.py and table.py: - Adopt master's config access pattern (self._connection._config) - Keep graph-driven cascade/restrict implementation - Apply master's declare() config param, split_full_table_name(), _config in store context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add assert after conditional config import to narrow type for mypy (filepath.py, attach.py) - Add Any type annotation to untyped config parameters (hash_registry.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cascade restrictions stored as plain lists (for OR semantics) were
being directly assigned to ft._restriction, causing list objects to
be stringified as Python repr ("[' condition ']") in SQL WHERE clauses.
Use restrict_in_place() which properly handles lists as OR conditions
through the standard restrict() path. Also fix version string to be
PEP 440 compliant.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The delete() pre-check for part_integrity="enforce" was hardcoded and did not respect the part_integrity parameter passed to cascade(). Also, explicitly deleting from a part table (e.g. Website().delete()) would always fail because the cascade seed is the part itself and its master is never in the cascade graph. Fix: store _part_integrity and _cascade_seed during cascade(), only run the enforce check when part_integrity="enforce", and skip the seed node since it was explicitly targeted by the caller. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pre-check on the cascade graph was too conservative — it flagged part tables that appeared in the graph but had zero rows to delete. The old code checked actual deletions within a transaction. Replace the graph-based pre-check with a post-hoc check on deleted_tables (tables that actually had rows deleted). If a part table had rows deleted without its master also having rows deleted, roll back the transaction and raise DataJointError. This matches the original part_integrity="enforce" semantics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion_attributes FreeTable._restriction_attributes is None by default. The property accessor initializes it to set() on first access. The make_condition call in part_integrity="cascade" upward propagation was using the private attribute directly, causing AttributeError when columns=None. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds prune() method that removes tables with zero matching rows from the diagram. Without prior restrictions, removes physically empty tables. With restrictions (cascade or restrict), removes tables where the restricted query yields zero rows. Returns a new Diagram. Includes 5 integration tests: unrestricted prune, prune after restrict, prune after cascade, idempotency, and prune-then-restrict chaining. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add prune() method to both spec and design docs - Rename _propagate_to_children → _propagate_restrictions + _apply_propagation_rule - Fix delete() part_integrity: post-check with rollback, not pre-check - Add _part_integrity instance attribute - Update files affected, verification, and implementation phases - Mark open questions as resolved with actual decisions - Mark export/restore as future work Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove process artifacts (implementation phases, verification checklists, resolved decisions, files-changed tables). Both documents now describe the current system as-is, ready for migration into datajoint-docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ost-check part integrity Replace direct `_restriction` assignment with `restrict()` calls in Diagram so that AndList and QueryExpression objects are converted to valid SQL via `make_condition()`. Cascade delete uses OR convergence (a row is deleted if ANY FK reference points to a deleted row), while restrict/export uses AND. Part integrity enforcement uses a data-driven post-check: only raises when rows were actually deleted from a Part without its master also being deleted. This avoids false positives when a Part table appears in the cascade graph but has zero affected rows. Also adds dry_run support to delete()/drop(), prune() method, fixes CLI test subprocess invocation, and updates test fixtures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge restricted-diagram.md and restricted-diagram-spec.md into a single document reflecting the final implementation: _restrict_freetable for SQL generation, OR/AND convergence semantics, data-driven part_integrity post-check, and dry_run support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move OR/AND convergence logic into a single Diagram method that returns a FreeTable with the diagram's restrictions already applied. Callers no longer need to know about modes or pass restriction lists explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These warnings originate from matplotlib's internal pyparsing usage (_fontconfig_pattern.py, _mathtext.py), not from datajoint code. Filter them in pytest config to reduce noise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CI environment uses a newer pyparsing that doesn't have PyparsingDeprecationWarning. Use a message-based DeprecationWarning filter scoped to matplotlib instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PyparsingDeprecationWarning only occurs in older matplotlib versions. CI uses a newer version where it doesn't exist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- cascade() now documents graph trimming step (step 4) - delete() walks all non-alias nodes (graph already trimmed) - _restrict_freetable() renamed to _restricted_table() (instance method) - Sharpen distinction between cascade (delete) and restrict (subset) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diagram is now purely a graph computation and inspection tool (cascade, restrict, preview, prune). All mutation logic — transaction management, SQL execution, prompts — lives in Table.delete() and Table.drop(). Remove design docs superseded by datajoint-docs specs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The regex matching index lines in table definitions required exact end-of-line after the closing paren, rejecting valid declarations like `index(y, z) # for efficient coronal slice queries`. Updated regex to accept optional trailing comments and strip them before passing to compile_index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diagram now supports Python iteration protocol, yielding FreeTable objects in topological order. Table.delete() and Table.drop() use reversed(diagram) instead of manual topo_sort loops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids confusion with QueryExpression.preview() which shows table contents. Diagram.counts() returns row counts per table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The custom __iter__ only yields from nodes_to_show, creating a chicken-and-egg problem: when __init__ used 'for node in self:' to populate nodes_to_show for a schema source, nodes_to_show was still empty so __iter__ yielded nothing, leaving nodes_to_show empty. Fix: replace 'for node in self:' with 'for node in self.nodes()' which calls the inherited nx.DiGraph node iterator directly, independent of nodes_to_show. Fixes test_erd and all Diagram tests that failed due to empty graphs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…iagram # Conflicts: # pixi.lock
1. add_parts(): strip both backticks and double quotes from
identifiers so part-table detection works on PostgreSQL.
2. Extract _split_full_name() helper replacing 8 instances of the
fragile full_name.replace('"', '`').split('`') pattern in
visualization/collapse methods. Works with both quoting styles.
3. Fix error message in __init__: repr(source) not repr(source[0]) —
source is a schema/module, not a sequence.
4. Initialize _part_integrity="enforce" in __init__ and _from_table
instead of relying on getattr fallback in the copy constructor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove _split_full_name() from diagram.py and use the canonical adapter.split_full_table_name() method throughout. All call sites are instance methods with access to self._connection.adapter. Eliminates duplicate name-parsing logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace two more ad-hoc name-splitting implementations:
- FreeTable.__init__: was s.strip('`"') for s in name.split(".")
- Diagram.add_parts.is_part: same pattern
Both now use the canonical adapter.split_full_table_name().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the last ad-hoc name split in the codebase:
full_table_name.split(".")[-1].strip('`"') -> split_full_table_name()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The safemode prompt already provides a safer preview-and-confirm workflow: it executes within a transaction, shows all affected tables and row counts, and rolls back if the user declines. dry_run was a weaker alternative (pre-transaction count that could be stale) that added a union return type and dead parameters. For programmatic preview, use Diagram.cascade().counts() directly. The test_delete_dry_run, test_drop_dry_run, and test_part_delete_dry_run tests are replaced with test_delete_preview_with_counts which tests the Diagram.cascade().counts() API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cascade preview pattern is now a single call:
dj.Diagram.cascade(Session & 'subject_id=1').counts()
cascade() constructs the Diagram directly from the table
expression, includes all descendants (cross-schema), propagates
restrictions, and trims to the affected subgraph.
Table.delete() uses Diagram.cascade(self, ...) internally.
Table.drop() expands descendants inline via nx.descendants().
Removes _from_table() — no longer needed. Also removes dry_run
from delete() and drop() since safemode's transaction + rollback
provides a safer preview, and Diagram.cascade().counts() provides
programmatic preview.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove dead _source attribute (set but never read) - Update counts() error message to reference Diagram.cascade() - Update restrict() error message and docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
prune() removes tables with zero matching rows from the diagram. For cascade (delete), this is unsafe: between cascade computation and the actual DELETE, concurrent inserts could add rows to a pruned table, causing FK errors during delete. Zero-count tables in the cascade are harmless — delete_quick() on an empty result is a no-op. prune() now raises DataJointError on cascade-produced Diagrams. It remains valid for restrict() (export subsetting) and unrestricted Diagrams (showing populated tables). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| from .expression import QueryExpression | ||
| from .heading import Heading | ||
| from .staged_insert import staged_insert1 as _staged_insert1 | ||
| from .utils import get_master, is_camel_case, user_choice |
There was a problem hiding this comment.
Can get_master be fully removed? Not just as an import, but is it dead code now?
There was a problem hiding this comment.
Yes — removed. get_master() in utils.py had no remaining callers after the switch to extract_master() from dependencies.py.
src/datajoint/diagram.py
Outdated
| restrictions = self._cascade_restrictions if mode == "cascade" else self._restrict_conditions | ||
|
|
||
| # Multiple passes to handle part_integrity="cascade" upward propagation | ||
| max_passes = 10 |
There was a problem hiding this comment.
Replaced with a while any_new: loop. Termination is guaranteed because the dependency graph is a DAG and propagated_edges prevents re-processing the same edge. The loop only needs multiple passes when part_integrity='cascade' triggers upward propagation from a part to its master — in practice 2-3 passes at most.
src/datajoint/table.py
Outdated
| "deleting.".format(child=match["child"]) | ||
| ) from None | ||
| raise DataJointError("Delete blocked by FK in unloaded/inaccessible schema.") from None | ||
| except: |
There was a problem hiding this comment.
Can this at least catch runtime errors instead of bare except?
There was a problem hiding this comment.
Fixed — changed to except Exception:. This lets KeyboardInterrupt and SystemExit propagate without attempting cancel_transaction on a dying process.
src/datajoint/table.py
Outdated
| "deleting.".format(child=match["child"]) | ||
| ) from None | ||
| raise DataJointError("Delete blocked by FK in unloaded/inaccessible schema.") from None | ||
| except: |
There was a problem hiding this comment.
Can this scope instead of bare except?
There was a problem hiding this comment.
Addressed above — except Exception:.
| propagation rules at each edge. Only processes descendants of | ||
| start_node to avoid duplicate propagation when chaining. | ||
| """ | ||
| from .table import FreeTable |
There was a problem hiding this comment.
Intentional — diagram.py and table.py have a circular import dependency (Diagram uses FreeTable, Table.delete() uses Diagram). The inline from .table import FreeTable inside the method body is the standard pattern used throughout the codebase to break this cycle.
Previously, cascade delete and drop only traversed tables in explicitly activated schemas. If a dependent table lived in an unactivated schema (common in multi-schema pipelines), it was invisible to the dependency graph, causing FK errors at delete time. New Dependencies.load_all_downstream() method iteratively discovers schemas that reference the loaded schemas via FK relationships, expanding the dependency graph until all downstream schemas are included. Uses information_schema (MySQL) and pg_constraint (PostgreSQL) to find cross-schema FK references. Diagram.cascade() and Table.drop() now call load_all_downstream() before building the dependency graph. Includes integration test: two schemas where the downstream schema has an FK to the upstream schema, verifying that cascade delete discovers and deletes from both. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Remove dead get_master() from utils.py — replaced by extract_master() in dependencies.py, no remaining callers. 2. Replace max_passes=10 magic number in _propagate_restrictions with a while loop. Termination is guaranteed because the dependency graph is a DAG and propagated_edges prevents re-processing. Comment explains why multiple passes are needed (part_integrity="cascade" upward propagation). 3. Replace bare except: with except Exception: in Table.delete(). Lets KeyboardInterrupt and SystemExit propagate without attempting cancel_transaction on a dying process. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_part_integrity was set in __init__, copy constructor, and cascade() but never read — part_integrity flows as a function argument to _propagate_restrictions(), not through instance state. Add comment explaining the 50-iteration safety limit in load_all_downstream() (cross-schema FK chains could theoretically cycle, unlike the DAG within a schema). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The loop terminates when no new schemas are discovered. Since the total number of schemas on the server is finite, this is guaranteed to converge — even with cross-schema cycles. No safety cap needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add usage examples to Diagram class docstring and Dependencies.load_all_downstream() showing how to include tables from unactivated downstream schemas in visualization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Never called — __iter__ and __reversed__ use the module-level topo_sort() directly. No public callers in the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Replace the error-driven cascade in
Table.delete()with graph-driven restriction propagation usingDiagram. The Diagram is purely a graph computation and inspection tool — all mutation logic (transactions, SQL execution, prompts) lives inTable.delete()andTable.drop().Resolves: #865 (applying restrictions to a Diagram), #1110 (cascade delete fails on MySQL 8 with limited privileges)
Architecture
Table.delete()callsDiagram.cascade(self)to compute the affected subgraph, then iteratesreversed(diagram)to delete leaves first.Table.drop()builds a Diagram with all descendants and drops in reverse topological order. The Diagram never executes mutations — it computes the cascade graph and providescounts()for inspection.Public API:
Diagram.cascade()(classmethod)Diagram.cascade(table_expr)builds a complete cascade diagram from a table expression — including all descendants across all loaded schemas. It uses OR convergence: a descendant row is marked for deletion if any ancestor path reaches it.Public API:
diagram.restrict()(instance method)restrict()propagates restrictions downstream with AND convergence — a row is included only when ALL ancestor conditions are satisfied. Chainable.prune()removes zero-count tables (only valid on restrict, not cascade).The two modes are mutually exclusive —
restrict()raises on a cascade Diagram.Table.delete()andTable.drop()changesTable.delete()usesDiagram.cascade(self, ...)internally; all execution logic (transactions, SQL, prompts) stays in TableTable.drop()builds a Diagram with all descendants; drops in reverse topo orderPart.delete()returns the delete count;Part.drop()passespart_integritythroughpart_integritypost-check: data-driven, avoids false positives when a Part table appears in the cascade but has zero affected rowssafemode=True, delete previews all affected tables inside a transaction and asks "Commit deletes?" — declining rolls back everythingCross-schema cascade discovery
Dependencies.load_all_downstream()iteratively discovers schemas that reference the loaded schemas via FK relationships, expanding the dependency graph until all downstream schemas are included. This ensures cascade delete and drop reach dependent tables even in schemas that haven't been explicitly activated.Called by
Diagram.cascade()andTable.drop().Restriction propagation rules
For edge Parent→Child with
attr_map:parent_attrssubset ofchild.primary_keyfk_attrs != pk_attrs)parent.proj(**{fk: pk for fk, pk in attr_map.items()})parent_attrsnot subset ofchild.primary_keyparent.proj()Convergence:
cascadeuses OR (any path),restrictuses AND (all conditions).Bug fixes
Diagram.__init__: Custom__iter__created chicken-and-egg when populatingnodes_to_show. Fixed: useself.nodes()(inherited DiGraph iterator).declare.py): Allow inline comments on index lines.add_parts()strips both backticks and double quotes for PostgreSQL.adapter.split_full_table_name()— removed ad-hoc implementations.repr(source)instead ofrepr(source[0])in connection-not-found error.sys.executable -m datajoint.cliinstead of baredjcommand.get_master()fromutils.py(replaced byextract_master()) and dead_part_integrityattribute.except Exception:inTable.delete().Advantages over error-driven cascade
load_all_downstream()counts()preview + safemode confirmation before commitFiles changed (14)
diagram.pycascade()classmethod,restrict(),counts(),prune(),__iter__/__reversed__, all name parsing viaadapter.split_full_table_name()table.pydelete()anddrop()own execution logic;FreeTable.__init__usesadapter.split_full_table_name()dependencies.pyload()acceptsschema_names; newload_all_downstream()adapters/base.pyfind_downstream_schemas_sql()adapters/mysql.pyfind_downstream_schemas_sql()adapters/postgres.pyfind_downstream_schemas_sql()declare.pyuser_tables.pyPart.delete()returns value;Part.drop()passespart_integrityutils.pyget_master()version.pydocs/design/thread-safe-mode.mdtest_cascade_delete.pytest_erd.pytest_cli.pysys.executable -m datajoint.cliTest plan
Diagram.cascade().counts()