Worker sometimes doesn’t pick up specific tasks (scheduler/API enqueued) until multiple restarts

### Summary
- Worker processes some tasks (e.g., `shipment_tasks.process_external_shipments`) but ignores tracking tasks from other modules (scheduler/API enqueued). After multiple Swarm restarts (remove/re-add service), worker starts consuming them. Removing `--max-tasks-per-child "5000"` fixes it reliably.
- Regression: stable for ~1 month; issue started yesterday. Today, after many restarts, tasks began processing again. I suspect tasks don’t register properly after child recycle.

### Environment
- Orchestrator: Docker Swarm
- OS: Linux 5.4.0-216-generic
- Python: 3.x
- TaskIQ: 0.11.18
- taskiq-redis: 1.1.1
- taskiq-rate-limiter: 0.1.0b2
- redis (python): 6.4.0
- Broker: `ListQueueBroker` (Redis), `queue_name="shipaxis"`
- Result backend: `RedisAsyncResultBackend` (Redis)
- Scheduler: `TaskiqScheduler` + `LabelScheduleSource`
- Env: `ENV=prod` (cron labels are gated on `PROJECT_ENV == "prod"`)

### Commands/Flags
- Worker:
```
taskiq worker app.taskiq.worker:taskiq_broker \
  --workers 4 \
  --tasks-pattern "**/*_tasks.py" \
  --fs-discover \
  --log-level INFO \
  --max-async-tasks 60 \
  --max-prefetch 10 \
  --ack-type when_executed \
  --max-tasks-per-child "5000"   # problematic
```
- Scheduler:
```
taskiq scheduler app.taskiq.scheduler:taskiq_scheduler \
  --tasks-pattern "**/*_tasks.py" \
  --fs-discover \
  --log-level INFO \
  --skip-first-run
```

### What happens
- Tracking tasks from courier modules (queued by scheduler or API) are enqueued to Redis but not consumed.
- Other tasks (from a different module) are consumed normally.
- After several Swarm restarts (remove service, re-add), tracking tasks begin processing.
- Removing `--max-tasks-per-child "5000"` makes the problem disappear across redeploys.

### Expected
- Worker consistently registers and processes all discovered tasks across modules, with or without child recycling.

### Actual
- With `--max-tasks-per-child` set, the worker intermittently fails to “see” or process tracking tasks from certain modules, while other tasks work.

### Repro steps (approx)
1. Run worker with `--fs-discover`, `--tasks-pattern "**/*_tasks.py"`, and `--max-tasks-per-child "5000"`.
2. Define tasks across multiple modules (e.g., `shipment_tasks.py` and `courier/*_tasks.py`).
3. Enqueue tasks from both groups (some via scheduler, some via API).
4. Observe that only the shipment task(s) run; tracking tasks remain queued.
5. Restart/recreate services multiple times in Swarm → tracking tasks eventually start running.
6. Remove `--max-tasks-per-child` → tracking tasks process reliably from first boot.

### Timeline/Regression
- Running fine for about a month.
- Started failing yesterday without intentional code changes.
- Today, after many restarts (rm/re-add in Swarm), tasks started processing again.
- Strong suspicion: tasks aren’t registered correctly after child recycle.

### Notes/Observations
- Using fs-discovery; if a module errors at import, tasks won’t register. No obvious import errors at INFO log level.
- Explicitly importing the task packages at worker startup (e.g., `import app.tasks` where `__init__` imports `courier/*_tasks.py`) makes behavior deterministic (tasks always register).
- Hypothesis: child recycling + fs-discovery/import timing leaves some children without the full registry after fork, or there’s an import/env race. Removing the per-child cap avoids recycle-related non-determinism.

### Questions
- Is `--max-tasks-per-child` supported with fs-discovery across multiple task modules?
- Should fs-discovery be avoided when using child recycling, and explicit imports preferred?
- Any recommended pattern to ensure a complete task registry after child recycle?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Worker sometimes doesn’t pick up specific tasks (scheduler/API enqueued) until multiple restarts #544

Summary

Environment

Commands/Flags

What happens

Expected

Actual

Repro steps (approx)

Timeline/Regression

Notes/Observations

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Worker sometimes doesn’t pick up specific tasks (scheduler/API enqueued) until multiple restarts #544

Description

Summary

Environment

Commands/Flags

What happens

Expected

Actual

Repro steps (approx)

Timeline/Regression

Notes/Observations

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions