Skip to content

Improve transactional integrity in db utils#63663

Open
Gautam-Bharadwaj wants to merge 6 commits intoapache:mainfrom
Gautam-Bharadwaj:fix-db-utils-transactional-logic
Open

Improve transactional integrity in db utils#63663
Gautam-Bharadwaj wants to merge 6 commits intoapache:mainfrom
Gautam-Bharadwaj:fix-db-utils-transactional-logic

Conversation

@Gautam-Bharadwaj
Copy link
Copy Markdown
Contributor

@Gautam-Bharadwaj Gautam-Bharadwaj commented Mar 15, 2026

Refactored merge_conn and add_default_pool_if_not_exists to remove direct calls to session.commit(). These utility functions should remain atomic and leave transaction management to the caller to avoid partial state commits in case of downstream failures.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 16, 2026

@Gautam-Bharadwaj This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria.

Issues found:

  • ⚠️ Testing Requirements: The PR changes behavior in merge_conn and add_default_pool_if_not_exists by replacing session.commit() with session.flush(). This is a potentially breaking change for callers that relied on the commit being issued inside these functions. No new or updated tests are included to demonstrate that the behavior is correct under the new semantics (e.g., that callers still see persisted data after calling these utilities). A regression test that exercises the @provide_session-managed path without an outer transaction, as well as the caller-managed-session path, would be appropriate.

Note: Your branch is 21 commits behind main. Some check failures may be caused by changes in the base branch rather than by your PR. Please rebase your branch and push again to get up-to-date CI results.

What to do next:

  • The comment informs you what you need to do.
  • Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
  • There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates.
  • Maintainers will then proceed with a normal review.

There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack.

@Gautam-Bharadwaj Gautam-Bharadwaj force-pushed the fix-db-utils-transactional-logic branch from 46bdd7d to 010b25a Compare March 16, 2026 07:45
@Gautam-Bharadwaj
Copy link
Copy Markdown
Contributor Author

Gautam-Bharadwaj commented Mar 16, 2026

@potiuk Thanks for the feedback Jarek! I've rebased the branch onto latest main and added regression tests in test_db.py. The new tests verify that merge_conn and add_default_pool_if_not_exists correctly commit when managed by @provide_session, but only flush (and don't commit) when an external session is provided. I've also added an improvement newsfragment. Ready for review! 👋

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 16, 2026

Why do you think this should happen? I completely do not understand why you would like to do it this way.

@potiuk potiuk marked this pull request as draft March 16, 2026 09:47
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 16, 2026

Converted to Draft, but I fail to see why you would like to not commit this message if session is provided from outside - that makes very little sense because the idea is that those methods actually commit the change - even if it means that they commit other things that were changed before.

Do you have a good reason for that change or is it just generic thinking that it should be like that ? Is there a problem you are trying to solve ? Because it's far from refactoring. This is changing semantics of those methods.

Did you evaluate what effects it has on all the places where those methods are used?

@Gautam-Bharadwaj
Copy link
Copy Markdown
Contributor Author

Gautam-Bharadwaj commented Mar 16, 2026

@potiuk The primary goal here is to make these utility functions atomic and transaction-safe when called within a larger session (like initdb or create_default_connections).

Currently, merge_conn is called in a loop inside create_default_connections. If merge_conn contains a commit(), it forces a commit on every single iteration. This is not only slow but also prevents create_default_connections from being an atomic operation if it fails halfway, the first half remains committed. By moving to flush(), we allow the outer @provide_session (in create_default_connections) to manage the single final commit.

Importantly, this change is largely backwards compatible for standalone callers. Because these functions are decorated with @provide_session, if they are called without an external session, the wrapper will create a new session, call the function, and then commit it upon exit. So merge_conn(conn) will still result in a persistent change in the DB immediately.

The only difference is when a session is passed from outside and in that case, the caller typically expects to control the transaction boundary. Does this make sense, or do you see a specific case where a nested commit is preferred?

@Gautam-Bharadwaj Gautam-Bharadwaj force-pushed the fix-db-utils-transactional-logic branch from 9b0b4cf to 2403e43 Compare March 17, 2026 14:13
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because the author has not responded to a request for more information. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 26, 2026
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 2, 2026
@github-actions github-actions bot removed stale Stale PRs per the .github/workflows/stale.yml policy file pending-response labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants