Skip to content

Add async worker execution mode#728

Open
crmne wants to merge 14 commits intorails:mainfrom
crmne:async-worker-execution-mode
Open

Add async worker execution mode#728
crmne wants to merge 14 commits intorails:mainfrom
crmne:async-worker-execution-mode

Conversation

@crmne
Copy link
Copy Markdown

@crmne crmne commented Apr 4, 2026

Summary

Hi @rosa, I finally had some time to work on this after our earlier conversation about async worker execution mode.

This PR is a first implementation of async worker execution mode for Solid Queue. I'm also running benchmarks in parallel and can follow up with numbers once I have stable results.

Workers can now be configured with execution_mode: :async (or :fiber), which runs claimed jobs as fibers on a single async reactor thread and bounds concurrency with capacity / fibers instead of a thread pool.

This is separate from supervisor async mode. Supervisor mode still controls whether managed processes run in forks or threads; this change adds an async execution backend for workers themselves.

What Changed

  • added an execution pool abstraction for workers
  • introduced SolidQueue::ExecutionPools::AsyncPool
  • moved existing thread-pool behavior into SolidQueue::ExecutionPools::ThreadPool
  • added execution_mode: :async and :fiber as a configuration alias
  • added capacity / fibers as the clearer async-worker concurrency options
  • updated workers to build execution pools after fork/boot rather than during initialize
  • updated worker metadata to report execution mode, capacity, and inflight work across both backends
  • wake workers when capacity becomes available again so async execution integrates cleanly with the poll loop
  • treat async pool cancellation as fatal so reactor failures are surfaced instead of being silently ignored

Configuration / Validation

This PR also tightens async worker validation:

  • async workers require fiber-scoped isolated execution state
  • async workers fail early if the async gem is not available
  • async workers reject threads: and require capacity or fibers instead

Database pool guidance is now Rails-version-aware:

  • Rails 7.1 keeps the conservative async DB-pool validation
  • Rails 7.2+ uses the lower async-worker bound, since ordinary Active Record query paths can release connections between async waits

The README now documents the version-specific DB-pool guidance and calls out sticky Active Record APIs that can still pin connections.

Why

The goal is to support cooperative, mostly I/O-bound job execution with lower thread overhead and a clearer concurrency model, while keeping the feature explicit and safe to configure.

Tests

Added and updated coverage for:

  • async pool behavior
  • async worker execution
  • worker metadata and wake-up behavior
  • async boot/shutdown/error handling
  • configuration validation
  • DB-pool validation and docs around Rails version differences

@seanharmer
Copy link
Copy Markdown

I've pushed my initial implementation for this to #729 for comparison. Feel free to take anything you deem useful. :)

@crmne
Copy link
Copy Markdown
Author

crmne commented Apr 5, 2026

Hi @rosa, I've published a benchmark harness that answers three questions:

  1. Within Solid Queue: how much do I/O-heavy workloads benefit from async execution compared to thread mode?
  2. DB pool ceiling: at what concurrency does thread mode exhaust the database connection pool, and how much further does async go?
  3. Across backends: how does Solid Queue compare to Async::Job + Redis when both run through ActiveJob?

https://github.com/crmne/solid_queue_bench

Solid Queue: async vs thread

async wins the majority of headline tests. Strongest gains by workload:

Workload Best delta Win rate
sleep +27.2% 6/9
async_http +26.0% 5/9
cpu +5.1% 7/9
ruby_llm_stream +20.2% 9/9

The cleanest result is ruby_llm_stream (real RubyLLM streaming + Turbo broadcasts) -- async wins every test. cpu staying roughly neutral is the expected control, which makes the I/O gains credible.

DB pool ceiling (stress suite)

The headline suite caps total concurrency to keep comparisons fair. The stress suite removes that cap. thread mode hit the DB pool wall after the baseline cap=25, proc=2 test and failed every higher-concurrency cell. async completed all 10/10 planned tests per workload -- it multiplexes fibers over a much smaller connection pool, so it survives where threads cannot.

Async::Job vs Solid Queue

Async::Job + Redis is faster across all shared tests (+7% to +213%), but that's a different backend entirely -- a throughput ceiling reference, not a same-backend comparison.

Bottom line

The main async win is not raw throughput. It's good I/O performance without thread-sized DB pools and the connection pressure that comes with them. Under stress, that difference becomes binary: threads fail, fibers keep going.

@crmne
Copy link
Copy Markdown
Author

crmne commented Apr 7, 2026

I'm running this code in production at Chat with Work right now. Switched from Async::Job largely because of operational advantages, especially visibility from Mission Control Jobs when debugging failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants