Skip to content

Bugfix/63173 k8s unicode log read#63673

Merged
jscheffl merged 6 commits intoapache:mainfrom
jscheffl:bugfix/63173-k8s-unicode-log-read
Mar 16, 2026
Merged

Bugfix/63173 k8s unicode log read#63673
jscheffl merged 6 commits intoapache:mainfrom
jscheffl:bugfix/63173-k8s-unicode-log-read

Conversation

@jscheffl
Copy link
Contributor

Follow-up of PR #63282 by @YoannAbriel - thanks for the pre-work!
As this issue is kind of important to us, I am following-up on a fix for the problem. Took over the commits into a new branch/PR and fixing mypy and pytests. Hope getting it green and to main now.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:google Google (including GCP) related issues labels Mar 15, 2026
YoannAbriel and others added 5 commits March 15, 2026 23:38
…read_logs

When pod output contains non-UTF-8 bytes (e.g. binary data from tqdm
progress bars, truncated multi-byte sequences at chunk boundaries),
kubernetes_asyncio's internal decode raises UnicodeDecodeError, crashing
the entire task.

Catch UnicodeDecodeError and retry with _preload_content=False to get
raw bytes, then decode with errors='replace' so logging continues
without killing the task.

Closes apache#63173
@jscheffl jscheffl force-pushed the bugfix/63173-k8s-unicode-log-read branch from 6e01601 to 96eb41c Compare March 15, 2026 22:38
Copy link
Contributor

@AutomationDev85 AutomationDev85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like to this is now also fixed in the Kubernetes use case.

@jscheffl jscheffl merged commit 36bbb0d into apache:main Mar 16, 2026
208 of 209 checks passed
fat-catTW pushed a commit to fat-catTW/airflow that referenced this pull request Mar 22, 2026
* fix(providers/k8s): handle UnicodeDecodeError in AsyncKubernetesHook.read_logs

When pod output contains non-UTF-8 bytes (e.g. binary data from tqdm
progress bars, truncated multi-byte sequences at chunk boundaries),
kubernetes_asyncio's internal decode raises UnicodeDecodeError, crashing
the entire task.

Catch UnicodeDecodeError and retry with _preload_content=False to get
raw bytes, then decode with errors='replace' so logging continues
without killing the task.

Closes apache#63173

* Fix mypy

* Fix pytest in GKE

* Fix pytest in GKE, use AsyncMock

* Fix pytest in GKE, fix assert

---------

Co-authored-by: Yoann Abriel <yoann.abriel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AirflowException in log handling when running KubernetesPodOperator in deferrable mode with logging interval

4 participants