[SPARK-55964] Cache coherence: clear function registry on DROP DATABASE#54781
[SPARK-55964] Cache coherence: clear function registry on DROP DATABASE#54781srielau wants to merge 6 commits intoapache:masterfrom
Conversation
- Add dropFunctionsInDatabase(db) to FunctionRegistryBase and implementations - SessionCatalog.dropDatabase clears scalar and table function registry cache for the dropped database so resolution does not see stale entries - Add SessionCatalogSuite tests for cache coherence
cloud-fan
left a comment
There was a problem hiding this comment.
Review Summary
Prior state and problem: When a database is dropped via SessionCatalog.dropDatabase, the session-level function registries (functionRegistry and tableFunctionRegistry) can retain stale entries for functions that belonged to the dropped database. Subsequent function resolution could then resolve these phantom functions.
Design approach: Add a dropFunctionsInDatabase method to FunctionRegistryBase, with a scan-and-remove implementation in SimpleFunctionRegistryBase and a no-op in EmptyFunctionRegistryBase. Call both registries from SessionCatalog.dropDatabase.
Implementation: Changes span FunctionRegistry.scala (trait + two implementations) and SessionCatalog.scala (5-line addition in dropDatabase). Two new tests in SessionCatalogSuite verify the clearing for scalar and table function registries.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Show resolved
Hide resolved
…ysis/FunctionRegistry.scala Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
|
The test failure seems to be real |
| invalidateCachedTable(QualifiedTableName(SESSION_CATALOG_NAME, dbName, t.table)) | ||
| } | ||
| // Clear cached functions in this database so the cache stays coherent on drop. | ||
| // normalizeFuncName stores entries with catalog=None, so the filter must match that. |
There was a problem hiding this comment.
This is a bug I've fixed in srielau#3 , we can update the code here later.
|
SQL tests all passed, thanks, merging to master! |
What changes were proposed in this pull request?
We now delete functions from a a schema dropped within the session from the sesssion function registry
design-cache-coherence.md
Why are the changes needed?
Without this change the session coudl keep resolving fucntions from a schema it had dropped.
Does this PR introduce any user-facing change?
It fixes a bug that can be observed.
How was this patch tested?
Added new testcases
Was this patch authored or co-authored using generative AI tooling?
Claude Opus4.6