Skip to content

Fix: CloudWatch/Watchtower logs dropped in Task SDK due to handler lifetime bugs#66633

Open
korex-f wants to merge 2 commits intoapache:mainfrom
korex-f:fix/remote-log-handler-premature-close
Open

Fix: CloudWatch/Watchtower logs dropped in Task SDK due to handler lifetime bugs#66633
korex-f wants to merge 2 commits intoapache:mainfrom
korex-f:fix/remote-log-handler-premature-close

Conversation

@korex-f
Copy link
Copy Markdown
Contributor

@korex-f korex-f commented May 9, 2026

Three related bugs all produce WatchtowerWarning: "Received message after logging system shutdown", causing CloudWatch log streams to be truncated or never created.

Bug 1 (task-sdk/src/airflow/sdk/log.py):
In configure_logging(), remote.processors was accessed before the inner configure_logging() call ran dictConfig(). dictConfig() calls _clearExistingHandlers()logging.shutdown() on all existing handlers, killing the just-created watchtower handler (shutting_down=True) before any task log is emitted. Fix: move getattr(remote, "processors") to after dictConfig runs, injecting via a second structlog.configure() call.

Bug 2 (providers/amazon/.../cloudwatch_task_handler.py):
CloudWatchRemoteLogIO.handler was a cached_property — once killed by logging.shutdown() the dead instance was never replaced. The processors cached_property also captured _handler in a closure, pinning the dead instance. Fix: convert handler to a regular property that recreates when shutting_down=True; access self.handler dynamically in the processors closure.

Bug 3 (task-sdk/src/airflow/sdk/execution_time/supervisor.py):
_configure_logging() returned a plain tuple with no cleanup path. The only code that ever closed the remote handler was logging.shutdown() at process exit, which fired after supervise_task() returned while final task messages were still in flight. Fix: convert _configure_logging() to a contextmanager that explicitly flushes and closes the handler after process.wait() drains all task log messages.

closes: #66475

…etime bugs

Three related bugs all produce WatchtowerWarning: "Received message
after
logging system shutdown", causing CloudWatch log streams to be truncated
or never created.

Bug 1 (task-sdk/src/airflow/sdk/log.py):
In configure_logging(), remote.processors was accessed before the inner
configure_logging() call ran dictConfig(). dictConfig() calls
_clearExistingHandlers() -> logging.shutdown() on all existing handlers,
killing the just-created watchtower handler (shutting_down=True) before
any task log is emitted. Fix: move getattr(remote, "processors") to
after
dictConfig runs, injecting via a second structlog.configure() call.

Bug 2 (providers/amazon/.../cloudwatch_task_handler.py):
CloudWatchRemoteLogIO.handler was a cached_property — once killed by
logging.shutdown() the dead instance was never replaced. The processors
cached_property also captured _handler in a closure, pinning the dead
instance. Fix: convert handler to a regular property that recreates when
shutting_down=True; access self.handler dynamically in the processors
closure.

Bug 3 (task-sdk/src/airflow/sdk/execution_time/supervisor.py):
_configure_logging() returned a plain tuple with no cleanup path. The
only code that ever closed the remote handler was logging.shutdown() at
process exit, which fired after supervise_task() returned while final
task messages were still in flight. Fix: convert _configure_logging() to
a contextmanager that explicitly flushes and closes the handler after
process.wait() drains all task log messages.

Fixes: apache#66475
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cloudwatch remote logging does not work for ECS Executor

1 participant