GROOVY-10307: Improve invokedynamic performance with optimized caching by jamesfredley · Pull Request #2374 · apache/groovy

jamesfredley · 2026-01-28T23:03:02Z

This PR significantly improves invokedynamic performance in Grails 7 / Groovy 4 by implementing several key optimizations:

https://issues.apache.org/jira/browse/GROOVY-10307
apache/grails-core#15293

Using https://github.com/jamesfredley/grails7-performance-regression as a test application

10K/10K = groovy.indy.optimize.threshold/groovy.indy.fallback.threshold

Key Results

Configuration	Runtime (4GB)	Startup
No-Indy	60ms	~14s
Optimized 10K/10K	139ms	~11s
Baseline Indy	267ms	~14s

Problem Statement

Root Cause Analysis

The performance in the Grails 7 Test Application was traced to the global SwitchPoint guard in Groovy's invokedynamic implementation:

Global SwitchPoint Invalidation: When ANY metaclass changes anywhere in the application, ALL call sites are invalidated
Grails Framework Behavior: Grails frequently modifies metaclasses during request processing (controllers, services, domain classes)
Cascade Effect: Each metaclass change triggers mass invalidation, forcing all call sites to re-resolve their targets
Guard Failures: The constant invalidation causes guards to fail repeatedly, preventing the JIT from optimizing method handle chains

Optimizations Applied

Disabled Global SwitchPoint Guard
- Removed the global SwitchPoint that caused ALL call sites to invalidate when ANY metaclass changed
- Individual call sites now manage their own invalidation via cache clearing
- Can be re-enabled with: -Dgroovy.indy.switchpoint.guard=true
Call Site Cache Invalidation
- Implemented call site registration mechanism (ALL_CALL_SITES set with weak references)
- When metaclass changes, all registered call sites have their caches cleared
- Targets reset to default (fromCache) path, ensuring metaclass changes are visible
- Garbage-collected call sites are automatically removed

Optimizations Removed (No Measurable Benefit)

Testing showed these additional optimizations provide no measurable performance benefit while adding complexity:

~~Monomorphic Fast-Path~~ (REMOVED)
- ~~Added volatile fields (latestClassName/latestMethodHandleWrapper) to CacheableCallSite~~
- ~~Skips synchronization and map lookups for call sites that consistently see the same types~~
- Removal Reason: No measurable improvement in Grails 7 benchmark; added complexity
~~Optimized Cache Operations~~ (REMOVED)
- ~~Added get() method for cache lookup without automatic put~~
- ~~Added putIfAbsent() for thread-safe cache population~~
- Removal Reason: No measurable improvement; existing getAndPut() sufficient
~~Pre-Guard Method Handle Storage~~ (REMOVED)
- ~~Selector stores the method handle before argument type guards are applied (handleBeforeArgGuards)~~
- ~~Enables potential future optimizations for direct invocation~~
- Removal Reason: Future optimization not implemented; added unused code

What Was NOT Changed

Default thresholds remain 10,000/10,000 (same as baseline GROOVY_4_0_X)
All guards remain intact - guards are NOT bypassed, only the global SwitchPoint is disabled
Cache semantics unchanged - LRU eviction, soft references, same default size (4)

Performance Results

Test Environment

Java: Amazon Corretto JDK 25.0.2
OS: Windows 11
Test Application: Grails 7 performance benchmark
Data Set: 5 Companies, 29 Departments, 350 Employees, 102 Projects, 998 Tasks, 349 Milestones, 20 Skills
Test Method: 5 warmup + 50 test iterations per run, 5 runs per configuration

Runtime Performance Comparison (4GB Heap - Recommended)

Configuration	Min	Max	Avg	Stability
No-Indy	55ms	~85ms	60ms	Excellent
Optimized 10K/10K	128ms	~216ms	139ms	Good
Baseline Indy	224ms	~336ms	267ms	Good

Startup Time Comparison

Configuration	4GB	Notes
No-Indy	~14s	Consistent
Baseline Indy	~14s	Similar to no-indy
Optimized 10K/10K	~11s	Fastest startup

Visual Performance Summary (4GB Heap)

Runtime Performance - Lower is Better
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
No-Indy:        ███ 60ms
Opt 10K/10K:    ███████ 139ms         
Baseline Indy:  █████████████ 267ms
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Startup Time - Lower is Better
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Opt 10K/10K:    ███████ 11s           
No-Indy:        ████████████ 14s
Baseline Indy:  ████████████ 14s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Configuration Test Matrix

All tests performed on January 30, 2026 with JDK 25.0.2 (Corretto), 4GB heap (-Xmx4g -Xms2g). Runtime values are stabilized averages (runs 3-5).

Configuration	Startup	Runtime (Avg)	Min	Max
No-Indy	~14s	60ms	55ms	85ms
Baseline Indy	~14s	267ms	224ms	336ms
Optimized 10K/10K	~11s	139ms	128ms	216ms

Failed Optimization Attempts

These approaches were tried but either caused test failures, provided insufficient benefit (<6%), or made performance worse. They are documented here to prevent repeating the same mistakes.

UNSAFE - Caused Test Failures

1. Skip Metaclass Guards for Cache Hits

What it tried: Store method handle before guards, use unguarded handle for cache hits.

Why it failed: Call sites can be polymorphic (called with multiple receiver types). The unguarded handles contain explicitCastArguments with embedded type casts that throw ClassCastException when different types pass through.

2. Version-Based Metaclass Guards

What it tried: Use metaclass version instead of identity for guards, skip guards for "cache hits."

Why it failed: Same polymorphic call site issue - skipping guards for cache hits is fundamentally unsafe.

3. Skip Argument Guards After Warmup

What it tried: After warmup period, use handles without argument type guards.

Why it failed: Call sites can receive different argument types over their lifetime. Skipping argument guards causes ClassCastException or IncompatibleClassChangeError.

4. Per-Class Metaclass Invalidation

What it tried: Only invalidate call sites for the specific class whose metaclass changed, not all call sites globally.

Why it failed: Broke mock framework tests (MockFor, StrictExpectation). The mock framework works by temporarily replacing metaclass and expects ALL call sites to see the change immediately - not just call sites for that class.

5. Monomorphic Fast-Path with Identity Check

What it tried: Store latest receiver class name and method handle for quick identity check before hash lookup.

Why it failed: UNSAFE with per-instance metaclasses. Causes NPE when NULL_METHOD_HANDLE_WRAPPER gets cached for stale handles.

INEFFECTIVE - Less Than 6% Improvement

6. Increase Cache Size (4 to 16)

What it tried: Increase the per-call-site LRU cache from 4 to 16 entries to improve polymorphic call site performance.

Result: Less than 6% improvement in Grails benchmark. The polymorphic benchmark uses way more than 16 types, so cache size doesn't help.

7. Per-Class Generation Tokens (JRuby Pattern)

What it tried: Add classInfoVersion field to cache entries. On each call, compare cache token with receiver class generation for cheap integer validation instead of hash lookup.

Result: Less than 6% improvement. Category and mock tests still require global cache clearing anyway, so generation tokens provide little benefit.

8. Optimized Cache Operations (get/putIfAbsent methods)

What it tried: Add separate get() and putIfAbsent() methods instead of combined getAndPut().

Result: No measurable benefit. The overhead is elsewhere, not in cache operations.

9. Pre-Guard Method Handle Storage

What it tried: Store method handle before argument type guards are applied (handleBeforeArgGuards, directMethodHandle) for potential future optimizations.

Result: No measurable benefit. Added unused code complexity.

HARMFUL - Made Performance Worse

10. Bimorphic Inline Cache

What it tried: Add a 2-slot inline cache with identity comparison before HashMap lookup. Fast path for call sites that see 1-2 types.

Results:

Polymorphic benchmark: 1.42s vs 1.60s (11% improvement)
Grails app: 127ms vs 132ms (4% SLOWER)
Metaclass ratio: 56.54x vs 66.76x (18% worse)

Why it failed: Adds overhead on every call site access. While it helps highly polymorphic sites, it hurts monomorphic-dominant workloads like typical Grails apps.

11. Parallel Stream for Cache Invalidation (Approach 1)

What it tried: Use parallelStream().filter().collect() with atomic swap for call site invalidation.

Result: Made Grails app 18% SLOWER (167ms vs 141ms baseline).

12. Parallel Stream for Cache Invalidation (Approach 2)

What it tried: Use parallelStream().forEach() for invalidation + removeIf() for cleanup.

Result: Made Grails app 12% SLOWER (158ms vs 141ms baseline).

Why parallel streams failed: Thread coordination overhead outweighs parallelization benefit for typical call site counts (~1000-5000 sites). Sequential removeIf has better cache locality and no synchronization overhead.

Key Learnings

Any optimization that bypasses guards based on cache state is unsafe because:

The cache key (receiver Class) identifies which handle to use
But the call site itself can receive different types over its lifetime
Guards in the method handle chain ensure proper fallback when types don't match
The JVM's explicitCastArguments will throw if types don't match

Safe optimizations must:

Keep all guards intact
Focus on reducing guard overhead, not bypassing guards
Reduce cache misses rather than optimize the miss path unsafely

Performance tradeoffs:

Optimizations that help polymorphic call sites (bimorphic cache, larger cache) often hurt monomorphic-dominant workloads
Per-class invalidation breaks Groovy's mock framework which relies on immediate global metaclass visibility
Parallel processing doesn't help for small collections (~1000-5000 call sites)

Remaining Performance Gap

The optimized indy is still ~2.3x slower than non-indy (139ms vs 60ms). The remaining gap comes from:

Guard evaluation overhead - Every call must check type guards
Method handle chain traversal - Composed handles have inherent overhead
Cache lookup cost - Cache lookup overhead on each call
Fallback path cost - Cache misses trigger full method resolution

Potential Future Optimization Areas

Reduce guard evaluation cost - Make guards cheaper (not skip them)
Improve cache hit rate - Larger cache, better eviction policy
Reduce method handle chain depth - Fewer composed handles
JIT-friendly patterns - Ensure method handle chains can be inlined
Profile-guided optimization - Use runtime profiles to specialize hot paths
Megamorphic threshold detection - Track relink count per call site, use simpler unoptimized path after threshold (Nashorn/Scala pattern)

Configuration Reference

System Properties

Property	Default	Description
`groovy.indy.optimize.threshold`	10000	Hit count before setting guarded target on call site
`groovy.indy.fallback.threshold`	10000	Fallback count before resetting to cache lookup path
`groovy.indy.callsite.cache.size`	4	Size of LRU cache per call site
`groovy.indy.switchpoint.guard`	false	Enable global SwitchPoint guard (NOT recommended)

Files Modified

Source Files

File	Lines	Changes
`Selector.java`	+16	Disabled SwitchPoint guard by default
`IndyInterface.java`	+41	Call site registration, cache invalidation on metaclass change
`CacheableCallSite.java`	+17	Added `clearCache()` method
Total	+69	Minimal changes for maximum impact

This commit significantly improves invokedynamic performance in Grails 7 / Groovy 4 by implementing several key optimizations: 1. **Disabled Global SwitchPoint Guard**: Removed the global SwitchPoint that caused ALL call sites to invalidate when ANY metaclass changed. In Grails and similar frameworks with frequent metaclass modifications, this was causing massive guard failures and performance degradation. Individual call sites now manage their own invalidation via cache clearing. Can be re-enabled with: -Dgroovy.indy.switchpoint.guard=true 2. **Monomorphic Fast-Path**: Added volatile fields (latestClassName/ latestMethodHandleWrapper) to CacheableCallSite to skip synchronization and map lookups for call sites that consistently see the same types, which is the common case in practice. 3. **Call Site Cache Invalidation**: Added call site registration and cache clearing mechanism. When metaclass changes, all registered call sites have their caches cleared and targets reset to the default (fromCache) path, ensuring metaclass changes are visible. 4. **Optimized Cache Operations**: Added get() and putIfAbsent() methods to CacheableCallSite for more efficient cache access patterns, and clearCache() for metaclass change invalidation. 5. **Pre-Guard Method Handle Storage**: Selector now stores the method handle before argument type guards are applied (handleBeforeArgGuards), enabling potential future optimizations for direct invocation. The thresholds can be tuned via system properties: - groovy.indy.optimize.threshold (default: 10000) - groovy.indy.fallback.threshold (default: 10000) - groovy.indy.callsite.cache.size (default: 4) - groovy.indy.switchpoint.guard (default: false)

jamesfredley · 2026-01-28T23:25:16Z

I do not believe the additional (10) failure was caused by these changes, since this commit already passed earlier today jamesfredley@510762d, but I do not have permission to rerun the job.

Remove monomorphic fast-path, optimized cache ops, and pre-guard method handle storage. Testing showed these provide no measurable benefit in Grails 7 benchmarks while adding complexity. The minimal optimization (disabled SwitchPoint guard + cache invalidation) achieves the same performance improvement with 75% less code (69 lines vs 277 lines). Removed: - Monomorphic fast-path (latestClassName/latestMethodHandleWrapper fields) - Optimized cache operations (get/putIfAbsent methods) - Pre-guard method handle storage (handleBeforeArgGuards, directMethodHandle) Retained: - Disabled global SwitchPoint guard by default - Cache invalidation on metaclass changes - Call site registration for invalidation tracking

jamesfredley · 2026-01-30T04:30:43Z

Optimizations Retained

Disabled Global SwitchPoint Guard (Selector.java)
- Removed the global SwitchPoint that caused ALL call sites to invalidate when ANY metaclass changed
- Can be re-enabled with: -Dgroovy.indy.switchpoint.guard=true
Call Site Cache Invalidation (IndyInterface.java, CacheableCallSite.java)
- Implemented call site registration mechanism (ALL_CALL_SITES set with weak references)
- When metaclass changes, all registered call sites have their caches cleared
- Targets reset to default (fromCache) path, ensuring metaclass changes are visible

Optimizations Removed (No Measurable Benefit)

Monomorphic Fast-Path - latestClassName/latestMethodHandleWrapper volatile fields
Optimized Cache Operations - get(), putIfAbsent() methods
Pre-Guard Method Handle Storage - handleBeforeArgGuards, directMethodHandle fields

Testing showed these additional optimizations provide no measurable performance benefit in the Grails 7 benchmark while adding complexity.

jonnybot0 · 2026-01-30T19:59:54Z

src/main/java/org/codehaus/groovy/vmplugin/v8/Selector.java

+            // sufficient, combined with cache invalidation on metaclass changes.
+            // 
+            // If you need strict metaclass change detection, set groovy.indy.switchpoint.guard=true
+            if (SystemUtil.getBooleanSafe("groovy.indy.switchpoint.guard")) {


The one question I have about this is whether this check ought to be inverted. That is, should the default behaviour be the old way, which we know is less performant but more battle-tested, or this way? Could Grails simply turn this flag on by default, thus solving their problem, without impacting other users?

Also, as a nitpicky thing, I reckon this string ought to be a public static final field so that we could add some tests around it.

If this PRs approach is viable, but does not prove to be generally valuable for all/most code run with Indy on, than I think the inverse would be fine for Grails and we could set that flag in default generated applications.

This is part of why I'd like to run this branch against the current benchmarks on CI. That would likely provide some confidence as to whether this was "safe" to merge without impacting anyone's expectations about how the language behaves.

jonnybot0 · 2026-01-30T20:10:56Z

I'm also well outside my expertise here, but so long as we're doing this, I'd be curious to see how this runs against the performance tests we have, compared to baseline.
https://ci-builds.apache.org/job/Groovy/job/Groovy%204%20performance%20tests/ has a Jenkins build that runs these. That's no good for running against a community repository, but perhaps we could push this up to a different branch and run it. Little outside my usual wheelhouse, so I'd like the nod from @paulk-asert, @blackdrag, or someone else on the PMC before I just run it willy-nilly.

I tried running ./gradlew :performance:performanceTests on my local machine with Java 17 on this branch:

Groovy 4.0.7 Average 376.26ms ± 64.42ms 
Groovy current Average 376.87ms ± 63.75ms (0.16% slower)
Groovy 3.0.14 Average 443.74ms ± 131.81ms (17.93% slower)
Groovy 2.5.20 Average 451.40999999999997ms ± 138.82ms (19.97% slower)

That seems comparable, so my assumption is this hasn't hurt the most basic performance measures we do. Main reason I'd want CI confirmation is to make sure it wasn't a quirk of my machine.

jamesfredley · 2026-01-31T04:01:11Z

Based on GROOVY-10307 and GROOVY-11842, I created another test project: https://github.com/jamesfredley/groovy-indy-performance

jamesfredley · 2026-01-31T04:57:16Z

Test Date: January 30, 2026
Runs per Configuration: 3
Environment: Windows 11, Java 17.0.18, 4GB heap, 20 CPUs

Master Comparison Table

All values are averages across 3 runs. Times in milliseconds unless noted. Bold indicates best performance.

Loop Benchmarks (ComprehensiveBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Loop: each + toString	33.09 ms	33.23 ms	32.70 ms
Loop: collect	52.62 ms	52.22 ms	54.59 ms
Loop: findAll	113.08 ms	116.15 ms	114.58 ms

Method Invocation Benchmarks (ComprehensiveBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Method: simple instance	8.77 ms	8.55 ms	8.94 ms
Method: with params	10.00 ms	10.08 ms	10.52 ms
Method: static	9.19 ms	7.73 ms	8.25 ms
Method: polymorphic	163.35 ms	1.70 s	1.88 s

Closure Benchmarks (ComprehensiveBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Closure: creation + call	21.45 ms	24.89 ms	24.54 ms
Closure: reused	19.44 ms	19.86 ms	18.59 ms
Closure: nested	39.25 ms	37.51 ms	38.22 ms
Closure: curried	154.37 ms	159.80 ms	161.86 ms

Other Benchmarks (ComprehensiveBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Property: read/write	20.80 ms	18.70 ms	19.33 ms
Collection: each	108.71 ms	107.19 ms	110.03 ms
Collection: collect	121.07 ms	116.21 ms	121.74 ms
Collection: inject	127.13 ms	132.85 ms	139.18 ms
GString: simple	101.22 ms	102.89 ms	105.09 ms
GString: multi-value	112.55 ms	116.85 ms	117.61 ms

Method Invocation Benchmarks (MethodInvocationBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Simple instance method	7.23 ms	6.59 ms	8.22 ms
Method with parameters	7.76 ms	8.08 ms	8.31 ms
Method with object param	11.41 ms	10.74 ms	11.10 ms
Static method	1.24 ms	3.47 ms	3.40 ms
Static method with params	1.17 ms	7.75 ms	8.11 ms
Monomorphic call site	46.89 ms	130.65 ms	126.87 ms
Polymorphic call site	382.26 ms	3.52 s	3.85 s
Interface method	6.34 ms	3.53 ms	3.42 ms
Dynamic typed calls	5.86 ms	3.04 ms	3.18 ms
Property access	27.52 ms	21.49 ms	22.58 ms
Method reference	34.40 ms	31.01 ms	31.25 ms
GString method	190.04 ms	193.28 ms	194.07 ms

Closure Benchmarks (ClosureBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Simple closure creation	32.52 ms	30.21 ms	31.72 ms
Closure reuse	22.79 ms	19.86 ms	21.15 ms
Multi-param closure	39.00 ms	38.34 ms	39.22 ms
Closure with capture	23.19 ms	19.94 ms	20.34 ms
Closure modify capture	14.30 ms	12.55 ms	13.45 ms
Closure delegation	40.33 ms	27.24 ms	31.99 ms
Nested closures	59.92 ms	56.24 ms	59.72 ms
Curried closure	291.59 ms	296.21 ms	289.39 ms
Right curried closure	188.59 ms	187.98 ms	189.34 ms
Closure composition	64.37 ms	57.93 ms	59.37 ms
Closure as parameter	42.64 ms	40.58 ms	41.48 ms
Closure spread	177.89 ms	2.10 s	2.10 s
Closure.call()	25.69 ms	22.22 ms	25.49 ms
Closure trampoline	55.88 ms	50.72 ms	55.22 ms
each with closure	28.55 ms	26.44 ms	28.72 ms
collect with closure	27.95 ms	26.13 ms	28.00 ms
findAll with closure	35.72 ms	37.14 ms	42.34 ms
inject with closure	33.69 ms	33.41 ms	36.29 ms

Loop Benchmarks (LoopsBenchmark)

Benchmark	NO-INDY	Optimized-INDY	Baseline-INDY
Original: each + toString	43.65 ms	45.55 ms	45.90 ms
Simple: each only	38.93 ms	42.03 ms	42.67 ms
Closure call	22.63 ms	19.77 ms	21.99 ms
Method call	8.14 ms	6.22 ms	5.39 ms
Nested loops	70.82 ms	72.94 ms	77.47 ms
Loop with collect	89.20 ms	90.26 ms	90.67 ms
Loop with findAll	209.40 ms	204.75 ms	209.98 ms

Call Site Invalidation (Critical for GROOVY-10307)

Metric	NO-INDY	Optimized-INDY	Baseline-INDY
Metaclass change ratio	1.27x	69.4x	102.6x
Baseline time (no changes)	6.83 ms	7.44 ms	5.18 ms
With changes time	8.51 ms	439.54 ms	531.72 ms

Summary Table: Key Metrics

Metric	NO-INDY	Optimized-INDY	Baseline-INDY
Polymorphic call site	382 ms	3.52 s	3.85 s
Metaclass change ratio	1.27x	69.4x	102.6x
Static method	1.24 ms	3.47 ms	3.40 ms
Monomorphic call site	46.89 ms	130.65 ms	126.87 ms
Closure spread	178 ms	2.10 s	2.10 s
Simple instance method	7.23 ms	6.59 ms	8.22 ms
Closure delegation	40.33 ms	27.24 ms	31.99 ms
Property access	27.52 ms	21.49 ms	22.58 ms
Interface method	6.34 ms	3.53 ms	3.42 ms
Dynamic typed calls	5.86 ms	3.04 ms	3.18 ms

daniellansun · 2026-01-31T21:48:59Z

src/main/java/org/codehaus/groovy/vmplugin/v8/IndyInterface.java

+
+        // Invalidate all call site caches and reset targets to default (cache lookup)
+        // This ensures metaclass changes are visible without using expensive switchpoint guards
+        ALL_CALL_SITES.removeIf(ref -> {


The size of ALL_CALL_SITES could be very big, so it's better to use parallel stream for better performance.

When metaclass changes, all registered call sites have their caches cleared

Current implementation will clear all cache no matter whether the changed metaclasses are releated.
Groovy 5 has lookup instance in the CacheableCallSite, it can help us determine which class is using the cache site, maybe we can backport it to Groovy 4:

groovy/src/main/java/org/codehaus/groovy/vmplugin/v8/CacheableCallSite.java

Line 47 in 249491a

private final MethodHandles.Lookup lookup;

I tested two versions of the parallel stream change with the two test application and saw a reduction in performance.

Metric Baseline Approach 1 Approach 2

Grails App (steady state) ~141 ms ~167 ms (18% SLOWER) ~158 ms (12% SLOWER)

Approach 1: Parallel Collect + Atomic Swap - 18% SLOWER

// Invalidate all call site caches and reset targets to default (cache lookup) // This ensures metaclass changes are visible without using expensive switchpoint guards Set<WeakReference<CacheableCallSite>> liveReferences = ALL_CALL_SITES.parallelStream() .filter(ref -> { CacheableCallSite cs = ref.get(); if (cs == null) { return false; // Don't keep garbage collected references } // Reset target to default (fromCache) so next call goes through cache lookup MethodHandle defaultTarget = cs.getDefaultTarget(); if (defaultTarget != null && cs.getTarget() != defaultTarget) { cs.setTarget(defaultTarget); } // Clear the cache so stale method handles are discarded cs.clearCache(); return true; // Keep live references }) .collect(Collectors.toSet()); // Atomic swap - clear and replace with live references only ALL_CALL_SITES.clear(); ALL_CALL_SITES.addAll(liveReferences);

Approach 2: Parallel ForEach + Sequential RemoveIf - 12% SLOWER
// Invalidate all call site caches in parallel

ALL_CALL_SITES.parallelStream().forEach(ref -> { CacheableCallSite cs = ref.get(); if (cs != null) { // Reset target to default (fromCache) so next call goes through cache lookup MethodHandle defaultTarget = cs.getDefaultTarget(); if (defaultTarget != null && cs.getTarget() != defaultTarget) { cs.setTarget(defaultTarget); } // Clear the cache so stale method handles are discarded cs.clearCache(); } }); // Remove garbage collected references (must be sequential for ConcurrentHashMap.KeySetView) ALL_CALL_SITES.removeIf(ref -> ref.get() == null);

Testing with the Groovy 5 caller-specific Lookup pattern is generally positive on the micro benchmarks but 11% slower in the grails test application:

More details: jamesfredley#1

src/main/java/org/codehaus/groovy/vmplugin/v8/IndyInterface.java

src/main/java/org/codehaus/groovy/vmplugin/v8/Selector.java

Copilot

Pull request overview

This PR improves invokedynamic performance in Groovy by addressing performance degradation caused by the global SwitchPoint guard. When any metaclass changes, the global SwitchPoint invalidates ALL call sites, causing severe performance issues in frameworks like Grails that frequently modify metaclasses.

Changes:

Disabled the global SwitchPoint guard by default (can be re-enabled via system property)
Implemented call site registration using WeakReferences for targeted cache invalidation
Added cache clearing mechanism that resets call sites to default targets when metaclasses change

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
Selector.java	Added system property to control SwitchPoint guard, disabled by default to avoid global invalidation overhead
IndyInterface.java	Implemented WeakReference-based call site registry and cache invalidation logic to handle metaclass changes without global SwitchPoint
CacheableCallSite.java	Added clearCache() method to discard stale method handles when metaclass changes occur

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/main/java/org/codehaus/groovy/vmplugin/v8/IndyInterface.java

src/main/java/org/codehaus/groovy/vmplugin/v8/Selector.java

src/main/java/org/codehaus/groovy/vmplugin/v8/CacheableCallSite.java

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/main/java/org/codehaus/groovy/vmplugin/v8/CacheableCallSite.java

- Document best-effort invalidation strategy in invalidateSwitchPoints(): concurrent threads may briefly reinstall optimized targets, but method handle guards (SAME_MC, SAME_CLASS) ensure correctness by forcing fallback - Keep clearCache() volatile field access outside synchronized block for consistency with getAndPut() which also accesses it outside the lock; volatile provides sufficient visibility guarantees for this field - Add comprehensive Javadoc for INDY_SWITCHPOINT_GUARD explaining performance implications of enabling the global switchpoint guard - Document WeakReference cleanup behavior in ALL_CALL_SITES and registerCallSite() to clarify garbage collection semantics - Simplify inline comment in setGuards() with detailed Javadoc on field

paulk-asert

I am generally in favor of this change but not for GROOVY_4_0_X in the first instance. Safest would be master only but I understand folks are still by-and-large porting to GROOVY 5 rather than looking at 6, so we might need to get it in 5 for further feedback. I still have to think further about scenarios for when it is safe vs unsafe for the proposed default settings. I am not averse to potentially eventually porting to 4 but there are lots of folks on 4 and I am unsure we understand enough which ones might be adversely affected.

jamesfredley · 2026-02-04T20:35:33Z

PR against master(Groovy 6): #2377

jamesfredley mentioned this pull request Jan 28, 2026

Groovy - invoke dynamic performance problems apache/grails-core#15293

Open

eric-milles requested a review from blackdrag January 29, 2026 01:53

jonnybot0 reviewed Jan 30, 2026

View reviewed changes

daniellansun reviewed Jan 31, 2026

View reviewed changes

daniellansun requested a review from paulk-asert January 31, 2026 22:20

daniellansun reviewed Jan 31, 2026

View reviewed changes

src/main/java/org/codehaus/groovy/vmplugin/v8/IndyInterface.java Outdated Show resolved Hide resolved

daniellansun reviewed Feb 1, 2026

View reviewed changes

src/main/java/org/codehaus/groovy/vmplugin/v8/Selector.java Outdated Show resolved Hide resolved

GROOVY-10307: Cache switchpoint guard system property in constant

43d99f9

jamesfredley force-pushed the improve-indy-performance branch 2 times, most recently from 70ff21f to 860f460 Compare February 1, 2026 01:42

GROOVY-10307: Set initial capacity for call site registry

ca36b8c

jamesfredley force-pushed the improve-indy-performance branch from 860f460 to ca36b8c Compare February 1, 2026 01:43

jamesfredley marked this pull request as ready for review February 1, 2026 20:34

Copilot AI review requested due to automatic review settings February 1, 2026 20:34

Copilot started reviewing on behalf of jamesfredley February 1, 2026 20:34 View session

Copilot AI reviewed Feb 1, 2026

View reviewed changes

jamesfredley requested a review from Copilot February 1, 2026 23:06

Copilot started reviewing on behalf of jamesfredley February 1, 2026 23:07 View session

Copilot AI reviewed Feb 1, 2026

View reviewed changes

src/main/java/org/codehaus/groovy/vmplugin/v8/CacheableCallSite.java Outdated Show resolved Hide resolved

jamesfredley force-pushed the improve-indy-performance branch from 6c6593e to 868cfc5 Compare February 1, 2026 23:49

jamesfredley force-pushed the improve-indy-performance branch from 868cfc5 to 8c05bd5 Compare February 2, 2026 00:33

paulk-asert requested changes Feb 4, 2026

View reviewed changes

jamesfredley mentioned this pull request Feb 4, 2026

GROOVY-10307: Groovy 6 - Improve invokedynamic performance with optimized caching #2377

Open

Conversation

jamesfredley commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Results

Problem Statement

Root Cause Analysis

Optimizations Applied

Optimizations Removed (No Measurable Benefit)

What Was NOT Changed

Performance Results

Test Environment

Runtime Performance Comparison (4GB Heap - Recommended)

Startup Time Comparison

Visual Performance Summary (4GB Heap)

Configuration Test Matrix

Failed Optimization Attempts

UNSAFE - Caused Test Failures

1. Skip Metaclass Guards for Cache Hits

2. Version-Based Metaclass Guards

3. Skip Argument Guards After Warmup

4. Per-Class Metaclass Invalidation

5. Monomorphic Fast-Path with Identity Check

INEFFECTIVE - Less Than 6% Improvement

6. Increase Cache Size (4 to 16)

7. Per-Class Generation Tokens (JRuby Pattern)

8. Optimized Cache Operations (get/putIfAbsent methods)

9. Pre-Guard Method Handle Storage

HARMFUL - Made Performance Worse

10. Bimorphic Inline Cache

11. Parallel Stream for Cache Invalidation (Approach 1)

12. Parallel Stream for Cache Invalidation (Approach 2)

Key Learnings

Remaining Performance Gap

Potential Future Optimization Areas

Configuration Reference

System Properties

Files Modified

Source Files

Uh oh!

jamesfredley commented Jan 28, 2026

Uh oh!

jamesfredley commented Jan 30, 2026

Optimizations Retained

Optimizations Removed (No Measurable Benefit)

Uh oh!

jonnybot0 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

jamesfredley Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

jonnybot0 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

jonnybot0 commented Jan 30, 2026

Uh oh!

jamesfredley commented Jan 31, 2026

Uh oh!

jamesfredley commented Jan 31, 2026

Master Comparison Table

Loop Benchmarks (ComprehensiveBenchmark)

Method Invocation Benchmarks (ComprehensiveBenchmark)

Closure Benchmarks (ComprehensiveBenchmark)

Other Benchmarks (ComprehensiveBenchmark)

Method Invocation Benchmarks (MethodInvocationBenchmark)

Closure Benchmarks (ClosureBenchmark)

Loop Benchmarks (LoopsBenchmark)

Call Site Invalidation (Critical for GROOVY-10307)

Summary Table: Key Metrics

Uh oh!

daniellansun Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

daniellansun Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesfredley Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

jamesfredley commented Jan 28, 2026 •

edited

Loading

daniellansun Feb 1, 2026 •

edited

Loading

jamesfredley Feb 1, 2026 •

edited

Loading