Skip to content

HA Sensor frequently crashes when using leader election #3760

@Joseph-Irving

Description

@Joseph-Irving

Describe the bug
At fairly random intervals but frequently enough to be noticeable, our sensors are crashing with this error when we turned on HA (having more than one replica). It always follows the sensor sending a message that it's lost leader election.

{"level":"info","ts":"2025-10-06T21:02:27.896064717Z","logger":"argo-events.sensor","caller":"leaderelection/leaderelection.go:176","msg":"Becoming a Follower, stand by ...","sensorName":"helm-build"}
{"level":"fatal","ts":"2025-10-06T21:02:27.89615058Z","logger":"argo-events.sensor","caller":"sensors/listener.go:80","msg":"leader lost: helm-build-sensor-rpcht-5f64f9cf97-dpbwb","sensorName":"helm-build","stacktrace":"github.com/argoproj/argo-events/pkg/sensors.(*SensorContext).Start.func2\n\t/home/runner/work/argo-events/argo-events/pkg/sensors/listener.go:80\ngithub.com/argoproj/argo-events/pkg/shared/leaderelection.(*natsEventBusElector).RunOrDie.func1\n\t/home/runner/work/argo-events/argo-events/pkg/shared/leaderelection/leaderelection.go:179\ngithub.com/argoproj/argo-events/pkg/shared/leaderelection.(*natsEventBusElector).RunOrDie\n\t/home/runner/work/argo-events/argo-events/pkg/shared/leaderelection/leaderelection.go:199\ngithub.com/argoproj/argo-events/pkg/sensors.(*SensorContext).Start\n\t/home/runner/work/argo-events/argo-events/pkg/sensors/listener.go:73\ngithub.com/argoproj/argo-events/pkg/sensors/cmd.Start\n\t/home/runner/work/argo-events/argo-events/pkg/sensors/cmd/start.go:85\ngithub.com/argoproj/argo-events/cmd/commands.init.0.NewSensorCommand.func2\n\t/home/runner/work/argo-events/argo-events/cmd/commands/sensor.go:14\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1019\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1071\ngithub.com/argoproj/argo-events/cmd/commands.Execute\n\t/home/runner/work/argo-events/argo-events/cmd/commands/root.go:19\nmain.main\n\t/home/runner/work/argo-events/argo-events/cmd/main.go:8\nruntime.main\n\t/opt/hostedtoolcache/go/1.24.4/x64/src/runtime/proc.go:283"}

We are using the jetstream nats eventbus.

To Reproduce
Steps to reproduce the behavior:

Create an event bus e.g:

---
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  annotations:
  name: test
  namespace: argo-workflows
spec:
  jetstream:
    version: 2.10.10

Create an eventsource, here's a simple one from your examples

---
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: file
spec:
  eventBusName: test
  template:
    container:
      volumeMounts:
        - mountPath: /test-data
          name: test-data
    volumes:
      - name: test-data
        emptyDir: {}
  file:
    example:
      watchPathConfig:
        directory: /test-data/
        path: x.txt
      eventType: CREATE

Create a sensor with more than 1 replica, again just using one of your examples:

---
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: file
spec:
  template:
    serviceAccountName: operate-workflow-sa
  replicas: 2
  eventBusName: test
  dependencies:
    - name: test-dep
      eventSourceName: file
      eventName: example
  triggers:
    - template:
        name: file-workflow-trigger
        k8s:
          operation: create
          source:
            resource:
              apiVersion: argoproj.io/v1alpha1
              kind: Workflow
              metadata:
                generateName: file-watcher-
              spec:
                entrypoint: print-message
                templates:
                  -
                    container:
                      args:
                        - "hello"
                      command:
                        - echo
                      image: busybox
                    name: print-message
          parameters:
            - src:
                dependencyName: test-dep
                dataKey: name
              dest: spec.templates.0.container.args.0
      retryStrategy:
        steps: 3

After some undefined amount of time the error will occur. Potentially immediately, sometimes not for hours.

Expected behavior
The Sensor would not crash

Environment (please complete the following information):

  • Kubernetes: v1.33.5
  • Argo WF: 3.7.2
  • Argo Events: v1.9.7

Additional context
Add any other context about the problem here.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions