feat: add GitHub App Authentication as an option for DagBundles#64422
feat: add GitHub App Authentication as an option for DagBundles#64422RaphCodec wants to merge 14 commits into
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
0639a51 to
d080795
Compare
|
@RaphCodec Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
There was a problem hiding this comment.
Pull request overview
Adds GitHub App-based authentication support to the Git provider’s GitHook so Git-backed DagBundles can authenticate to GitHub without SSH deploy keys (using an installation access token instead).
Changes:
- Extend
GitHookconnection extras/UI placeholders to accept GitHub App identifiers and generate an installation token for HTTPS cloning. - Add an optional dependency extra for GitHub App support (
pygithub/PyGithub) in the git provider. - Update the workspace lockfile to include the new optional extra and reflect a refreshed dependency resolution.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
providers/git/src/airflow/providers/git/hooks/git.py |
Adds GitHub App auth fields and token generation logic to rewrite HTTPS repo URLs with an installation token. |
providers/git/pyproject.toml |
Introduces an optional dependency extra for GitHub App support. |
uv.lock |
Updates the lockfile to include the new optional extra and updated resolved packages. |
1abb9fb to
23cf74d
Compare
687fb39 to
533e92f
Compare
|
Thanks for the feedback. I went over the code more thourghly, improved it and updated the branch. I will try to update the branch at least once a day to prevent merge conflicts. |
27884e7 to
a2d5a9a
Compare
|
@RaphCodec Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
e432d3f to
7ed0886
Compare
|
Quickest fix: git fetch upstream main && git rebase upstream/main
rm uv.lock && uv lock
git add uv.lock && git rebase --continue
git push --force-with-leaseAutomated nudge — ignore if you're not ready to rebase. This comment is updated in place on future |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
copilot suggestion Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nstallation id Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ith_key_file_reads_file Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… treated as an ssh key by mistake
7ed0886 to
a6458ea
Compare
…aises file_pem_key_content. validated github app ids and preserved private key
potiuk
left a comment
There was a problem hiding this comment.
Overview
Adds a third auth option (alongside SSH key and PAT) for the git provider's GitHook/DagBundle: GitHub App authentication. Adds [github] extra pulling in PyGithub>=2.1.1. Connection extras gain github_client_id + github_installation_id; the existing private_key/key_file carry the App's PEM. On hook init, mints an installation token via GithubIntegration.get_access_token(...) and injects x-access-token:<token>@ into the HTTPS repo URL.
High-impact issues (must fix)
1. Field name is misleading and the int() parse rejects valid Client IDs
The field is called github_client_id (with docstring "GitHub Client ID (or App ID)"). These are different things in GitHub:
- App ID is numeric (
12345) - Client ID is alphanumeric (
Iv23li...)
hooks/git.py does int(raw_github_client_id), which rejects every real Client ID. PyGithub also accepts Client IDs as strings in modern versions. Either:
- Rename the field to
github_app_idand document only the integer form, or - Accept the value as
str, let PyGithub validate, drop theint()parse.
2. Token never refreshes — bundle stops working after ~1 hour
__init__ mints the installation token once and stores it in self.repo_url. GitHub App installation tokens expire after 60 minutes. DagBundle hooks are typically held by a long-lived scheduler/worker process, so git fetch will start failing on every refresh after the first hour with no recovery path. There's no _refresh_token() and configure_hook_env/_process_git_auth_url are not re-called.
This needs a refresh mechanism (re-mint when expiry is near, ideally inside configure_hook_env or a @property that lazily refreshes).
3. Token persisted in repo_url is a logging/leak risk
Inlining https://x-access-token:<token>@host/... into self.repo_url means the token will appear in any log line that prints the URL (git's own output, exception messages, etc.), and it stays there for the life of the hook instance even after expiry. Consider keeping the bare URL on the hook and only injecting the token into the git subprocess env (GIT_ASKPASS or credential helper) on demand.
4. Network call in __init__
_get_github_app_token() does an HTTP call to GitHub during hook construction. If GitHub is briefly down, scheduler bootstrap fails. Defer the token fetch to first use (lazy property / configure_hook_env).
Medium-impact issues
5. Inconsistent exception types
if self.key_file and self.private_key: raise AirflowException(...)(existing line, AirflowException)- New code raises
ValueErrorfor missing/invalid GitHub-App config andOSErrorfor unreadable key file.
Pick one. AirflowException matches the surrounding hook convention.
6. Empty-string handling
extra.get("github_client_id") returns "" (not None) when a user clears the UI field. int("") raises ValueError. Use if not raw_github_client_id: to treat empty as "not set".
7. Unrelated change in _private_ui.yaml
Removes the description: for ExtraMenuItem — unrelated to the PR's purpose, looks like accidental codegen drift. Revert.
8. uv.lock churn (245+/-238) is huge for one optional dep
Adding PyGithub to an optional extra shouldn't move 483 lock lines. Suggests the lockfile was regenerated against a different Python/uv state. Worth re-running uv lock --no-update from a clean baseline and re-pushing only the PyGithub-relevant delta.
9. Tests mock the integration entirely
Every interesting test does mp.setattr(GitHook._get_github_app_token, lambda self: ...). There's no test that actually exercises the GithubIntegration(...) call shape, so a PyGithub API change (e.g. parameter rename in a future v3) breaks silently. At minimum, a test using mock.patch("github.GithubIntegration") to assert the call args (kwargs integration_id=, private_key=, installation_id=) would prevent that.
Minor / style
_get_github_app_tokenlacks a return-type annotation; rest of the class is annotated.- Tests use
with pytest.MonkeyPatch().context() as mp:instead of the conventionalmonkeypatchfixture argument. getattr(self, \"github_app_private_key\", None)inconfigure_hook_envis defensive against an attribute that's now always set in__init__— thegetattrdefault is dead.PyGithub>=2.1.1is unbounded above; a,<3upper cap would protect against major-version breakage in a security-sensitive dep.- Missing newsfragment in
airflow-core/newsfragments/64422.significant.rst(the PR template flags this as required for user-facing features). - No connection-docs update under
providers/git/docs/— the newextraskeys aren't documented for end users.
Verdict
Direction is right (GitHub App auth is the correct production answer), but issues #1 (field name / int parse) and #2 (no token refresh) are real correctness blockers — the feature as written either won't accept modern GitHub App credentials or will silently break DagBundle pulls after an hour. Issue #3 is a security concern. Recommend: address #1–#4 + add the refresh test before merging.
potiuk
left a comment
There was a problem hiding this comment.
Second-read pass via Codex (adversarial review). Independent of the primary review above; both reads converge on the token-lifecycle concern.
Codex Adversarial Review
Target: branch diff against main
Verdict: needs-attention
No-ship: the GitHub App auth path can leak installation tokens over plain HTTP and appears to persist one-hour tokens into Git remotes without a refresh path.
Findings:
- [high] GitHub App tokens are allowed in plain HTTP repository URLs (providers/git/src/airflow/providers/git/hooks/git.py:159-162)
GitHub App auth is documented here as requiring an HTTPS repository URL, but the guard accepts bothhttps://andhttp://._process_git_auth_url()then embeds the generated installation token into the URL, so a misconfiguredhttp://github.com/...or enterprise URL can put a live installation token on a plaintext request or redirect path. This is a credential exposure risk introduced by this auth mode.
Recommendation: Reject anything excepthttps://when GitHub App auth is enabled; do not permithttp://for generated installation tokens. - [high] Installation token is generated once and persisted into Git remotes (providers/git/src/airflow/providers/git/hooks/git.py:167-171)
The hook fetches a GitHub App installation token during__init__, stores it asauth_token, and then rewritesrepo_urlwith that token. Inference fromGitDagBundle: the first clone stores this URL as the bare repository'sorigin, and later refresh/fetch operations use the existing remote rather than rebuilding credentials. GitHub installation tokens expire, so long-lived Dag bundle directories can start failing fetches after token expiry even though the app key is still valid.
Recommendation: Do not persist installation tokens in the remote URL. Refresh the token immediately before each network operation and pass it through a non-persistent credential mechanism, or update the remote URL with a fresh token before every fetch/clone and scrub it afterward.
Purpose
Adding GitHub App Authentication to the Git Provider. This feature is intended to be a security benefit for those who want to use GitHub Dag Bundles in Airflow 3, but for whatever reason cannot use a SSH deploy key to authenticate to GitHub and would prefer not to use their own PAT.
In case of an existing issue, reference it using one of the following:
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.