Skip to content

feat: add GitHub App Authentication as an option for DagBundles#64422

Open
RaphCodec wants to merge 14 commits into
apache:mainfrom
RaphCodec:feat/git-bundles-github-app-auth
Open

feat: add GitHub App Authentication as an option for DagBundles#64422
RaphCodec wants to merge 14 commits into
apache:mainfrom
RaphCodec:feat/git-bundles-github-app-auth

Conversation

@RaphCodec
Copy link
Copy Markdown

@RaphCodec RaphCodec commented Mar 29, 2026

Purpose

Adding GitHub App Authentication to the Git Provider. This feature is intended to be a security benefit for those who want to use GitHub Dag Bundles in Airflow 3, but for whatever reason cannot use a SSH deploy key to authenticate to GitHub and would prefer not to use their own PAT.


In case of an existing issue, reference it using one of the following:


Was generative AI tooling used to co-author this PR?
  • Yes - Claude Sonnet 4.6

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Mar 29, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@RaphCodec RaphCodec marked this pull request as draft March 29, 2026 21:22
@RaphCodec RaphCodec force-pushed the feat/git-bundles-github-app-auth branch from 0639a51 to d080795 Compare March 29, 2026 22:53
@RaphCodec RaphCodec marked this pull request as ready for review March 29, 2026 22:55
@potiuk potiuk marked this pull request as draft April 1, 2026 19:04
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 1, 2026

@RaphCodec Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • Merge conflicts: This PR has merge conflicts with the main branch. Your branch is 90 commits behind main. Please rebase your branch (git fetch origin && git rebase origin/main), resolve the conflicts, and push again. See contributing quick start.

Note: Your branch is 90 commits behind main. Please rebase and push again to get up-to-date CI results.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GitHub App-based authentication support to the Git provider’s GitHook so Git-backed DagBundles can authenticate to GitHub without SSH deploy keys (using an installation access token instead).

Changes:

  • Extend GitHook connection extras/UI placeholders to accept GitHub App identifiers and generate an installation token for HTTPS cloning.
  • Add an optional dependency extra for GitHub App support (pygithub / PyGithub) in the git provider.
  • Update the workspace lockfile to include the new optional extra and reflect a refreshed dependency resolution.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.

File Description
providers/git/src/airflow/providers/git/hooks/git.py Adds GitHub App auth fields and token generation logic to rewrite HTTPS repo URLs with an installation token.
providers/git/pyproject.toml Introduces an optional dependency extra for GitHub App support.
uv.lock Updates the lockfile to include the new optional extra and updated resolved packages.

Comment thread providers/git/src/airflow/providers/git/hooks/git.py Outdated
Comment thread providers/git/src/airflow/providers/git/hooks/git.py Outdated
Comment thread providers/git/src/airflow/providers/git/hooks/git.py Outdated
Comment thread providers/git/src/airflow/providers/git/hooks/git.py Outdated
Comment thread providers/git/src/airflow/providers/git/hooks/git.py
Comment thread providers/git/pyproject.toml Outdated
@RaphCodec RaphCodec force-pushed the feat/git-bundles-github-app-auth branch 5 times, most recently from 1abb9fb to 23cf74d Compare April 19, 2026 15:09
@RaphCodec RaphCodec requested a review from Copilot April 19, 2026 15:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

Comment thread providers/git/src/airflow/providers/git/hooks/git.py Outdated
Comment thread providers/git/tests/unit/git/hooks/test_git.py Outdated
@RaphCodec RaphCodec force-pushed the feat/git-bundles-github-app-auth branch 3 times, most recently from 687fb39 to 533e92f Compare April 19, 2026 16:32
@RaphCodec
Copy link
Copy Markdown
Author

Thanks for the feedback. I went over the code more thourghly, improved it and updated the branch. I will try to update the branch at least once a day to prevent merge conflicts.

@RaphCodec RaphCodec marked this pull request as ready for review April 19, 2026 16:40
@RaphCodec RaphCodec force-pushed the feat/git-bundles-github-app-auth branch 4 times, most recently from 27884e7 to a2d5a9a Compare April 24, 2026 10:20
@potiuk potiuk marked this pull request as draft April 27, 2026 23:20
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 27, 2026

@RaphCodec Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • Merge conflicts: This PR has merge conflicts with main. GitHub's "Update branch" button cannot resolve them — please rebase locally and resolve the conflicts manually:
    git fetch upstream main
    git rebase upstream/main
    # resolve conflicts, then:
    git push --force-with-lease
    
  • Failing CI checks: Failing: Build info. This is most likely a downstream effect of the merge conflicts above — once the rebase is clean, please re-trigger CI by pushing the rebased branch.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@RaphCodec RaphCodec force-pushed the feat/git-bundles-github-app-auth branch 5 times, most recently from e432d3f to 7ed0886 Compare May 9, 2026 11:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

uv.lock on main just moved via #66980 ("[main] Upgrade important CI environment"), commit 0c4c1f8 and this PR currently conflicts.

Quickest fix:

git fetch upstream main && git rebase upstream/main
rm uv.lock && uv lock
git add uv.lock && git rebase --continue
git push --force-with-lease

Automated nudge — ignore if you're not ready to rebase. This comment is updated in place on future uv.lock bumps.

@RaphCodec RaphCodec force-pushed the feat/git-bundles-github-app-auth branch from 7ed0886 to a6458ea Compare May 9, 2026 20:08
…aises file_pem_key_content. validated github app ids and preserved private key
Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

Adds a third auth option (alongside SSH key and PAT) for the git provider's GitHook/DagBundle: GitHub App authentication. Adds [github] extra pulling in PyGithub>=2.1.1. Connection extras gain github_client_id + github_installation_id; the existing private_key/key_file carry the App's PEM. On hook init, mints an installation token via GithubIntegration.get_access_token(...) and injects x-access-token:<token>@ into the HTTPS repo URL.

High-impact issues (must fix)

1. Field name is misleading and the int() parse rejects valid Client IDs

The field is called github_client_id (with docstring "GitHub Client ID (or App ID)"). These are different things in GitHub:

  • App ID is numeric (12345)
  • Client ID is alphanumeric (Iv23li...)

hooks/git.py does int(raw_github_client_id), which rejects every real Client ID. PyGithub also accepts Client IDs as strings in modern versions. Either:

  • Rename the field to github_app_id and document only the integer form, or
  • Accept the value as str, let PyGithub validate, drop the int() parse.

2. Token never refreshes — bundle stops working after ~1 hour

__init__ mints the installation token once and stores it in self.repo_url. GitHub App installation tokens expire after 60 minutes. DagBundle hooks are typically held by a long-lived scheduler/worker process, so git fetch will start failing on every refresh after the first hour with no recovery path. There's no _refresh_token() and configure_hook_env/_process_git_auth_url are not re-called.

This needs a refresh mechanism (re-mint when expiry is near, ideally inside configure_hook_env or a @property that lazily refreshes).

3. Token persisted in repo_url is a logging/leak risk

Inlining https://x-access-token:<token>@host/... into self.repo_url means the token will appear in any log line that prints the URL (git's own output, exception messages, etc.), and it stays there for the life of the hook instance even after expiry. Consider keeping the bare URL on the hook and only injecting the token into the git subprocess env (GIT_ASKPASS or credential helper) on demand.

4. Network call in __init__

_get_github_app_token() does an HTTP call to GitHub during hook construction. If GitHub is briefly down, scheduler bootstrap fails. Defer the token fetch to first use (lazy property / configure_hook_env).

Medium-impact issues

5. Inconsistent exception types

  • if self.key_file and self.private_key: raise AirflowException(...) (existing line, AirflowException)
  • New code raises ValueError for missing/invalid GitHub-App config and OSError for unreadable key file.

Pick one. AirflowException matches the surrounding hook convention.

6. Empty-string handling

extra.get("github_client_id") returns "" (not None) when a user clears the UI field. int("") raises ValueError. Use if not raw_github_client_id: to treat empty as "not set".

7. Unrelated change in _private_ui.yaml

Removes the description: for ExtraMenuItem — unrelated to the PR's purpose, looks like accidental codegen drift. Revert.

8. uv.lock churn (245+/-238) is huge for one optional dep

Adding PyGithub to an optional extra shouldn't move 483 lock lines. Suggests the lockfile was regenerated against a different Python/uv state. Worth re-running uv lock --no-update from a clean baseline and re-pushing only the PyGithub-relevant delta.

9. Tests mock the integration entirely

Every interesting test does mp.setattr(GitHook._get_github_app_token, lambda self: ...). There's no test that actually exercises the GithubIntegration(...) call shape, so a PyGithub API change (e.g. parameter rename in a future v3) breaks silently. At minimum, a test using mock.patch("github.GithubIntegration") to assert the call args (kwargs integration_id=, private_key=, installation_id=) would prevent that.

Minor / style

  • _get_github_app_token lacks a return-type annotation; rest of the class is annotated.
  • Tests use with pytest.MonkeyPatch().context() as mp: instead of the conventional monkeypatch fixture argument.
  • getattr(self, \"github_app_private_key\", None) in configure_hook_env is defensive against an attribute that's now always set in __init__ — the getattr default is dead.
  • PyGithub>=2.1.1 is unbounded above; a ,<3 upper cap would protect against major-version breakage in a security-sensitive dep.
  • Missing newsfragment in airflow-core/newsfragments/64422.significant.rst (the PR template flags this as required for user-facing features).
  • No connection-docs update under providers/git/docs/ — the new extras keys aren't documented for end users.

Verdict

Direction is right (GitHub App auth is the correct production answer), but issues #1 (field name / int parse) and #2 (no token refresh) are real correctness blockers — the feature as written either won't accept modern GitHub App credentials or will silently break DagBundle pulls after an hour. Issue #3 is a security concern. Recommend: address #1#4 + add the refresh test before merging.

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requesting changes.

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-read pass via Codex (adversarial review). Independent of the primary review above; both reads converge on the token-lifecycle concern.

Codex Adversarial Review

Target: branch diff against main
Verdict: needs-attention

No-ship: the GitHub App auth path can leak installation tokens over plain HTTP and appears to persist one-hour tokens into Git remotes without a refresh path.

Findings:

  • [high] GitHub App tokens are allowed in plain HTTP repository URLs (providers/git/src/airflow/providers/git/hooks/git.py:159-162)
    GitHub App auth is documented here as requiring an HTTPS repository URL, but the guard accepts both https:// and http://. _process_git_auth_url() then embeds the generated installation token into the URL, so a misconfigured http://github.com/... or enterprise URL can put a live installation token on a plaintext request or redirect path. This is a credential exposure risk introduced by this auth mode.
    Recommendation: Reject anything except https:// when GitHub App auth is enabled; do not permit http:// for generated installation tokens.
  • [high] Installation token is generated once and persisted into Git remotes (providers/git/src/airflow/providers/git/hooks/git.py:167-171)
    The hook fetches a GitHub App installation token during __init__, stores it as auth_token, and then rewrites repo_url with that token. Inference from GitDagBundle: the first clone stores this URL as the bare repository's origin, and later refresh/fetch operations use the existing remote rather than rebuilding credentials. GitHub installation tokens expire, so long-lived Dag bundle directories can start failing fetches after token expiry even though the app key is still valid.
    Recommendation: Do not persist installation tokens in the remote URL. Refresh the token immediately before each network operation and pass it through a non-persistent credential mechanism, or update the remote URL with a fresh token before every fetch/clone and scrub it afterward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Git Provider GitHub App Authentication

3 participants