FEAT: token-expiry capture and identity-aware pool key#660
Conversation
) Foundation for identity-aware connection pooling. No behavior change; all additions are dormant until the connection/native layers consume them. - Capture token expiry: _acquire_token returns (struct, raw, expires_on); add TokenInfo, AADAuth.get_token_info, module-level get_auth_token_info. Public get_token/get_raw_token/get_token_struct signatures unchanged. - compute_identity_key(): per-identity pool-key discriminator. None for non-token auth (bare-connStr key, backward compatible); msi:<client_id>/msi:system derived without a token (enables #659 skip-on-hit); tok:<sha256> fail-safe hash otherwise. - is_token_near_expiry(): 5-min single-threshold check; None expiry treated as near (fail safe). - Tests: TestTokenExpiryCapture, TestComputeIdentityKey, TestIsTokenNearExpiry (110 auth tests pass).
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/connection/connection.cppLines 570-578 570 _conn->connect(connect_attrs);
571 _conn->setTokenExpiry(expiry);
572 } else {
573 _conn->connect(attrsBefore);
! 574 }
575 }
576 }
577
578 ConnectionHandle::~ConnectionHandle() {mssql_python/pybind/connection/connection_pool.cppLines 20-36 20 // hold the GIL. Keys in the dict are attribute ids (ints). SQL_COPT_SS_ACCESS_TOKEN
21 // is defined once in connection.h (shared with connection.cpp).
22 static std::string extractAccessToken(const py::dict& attrs) {
23 for (auto item : attrs) {
! 24 if (py::isinstance<py::int_>(item.first) &&
! 25 item.first.cast<long>() == SQL_COPT_SS_ACCESS_TOKEN) {
! 26 try {
! 27 return item.second.cast<std::string>();
! 28 } catch (const py::cast_error&) {
! 29 return std::string();
! 30 }
! 31 }
! 32 }
33 return std::string();
34 }
35
36 ConnectionPool::ConnectionPool(size_t max_size, int idle_timeout_secs)Lines 98-107 98 // Pool is full — throw immediately. Another thread may be
99 // validating a popped candidate outside the mutex right now, so
100 // a transient "pool full" is an acceptable trade-off that
101 // callers can retry.
! 102 throw std::runtime_error(
! 103 "ConnectionPool::acquire: pool size limit reached");
104 }
105 candidate = _pool.front();
106 _pool.pop_front();
107 }Lines 106-114 106 _pool.pop_front();
107 }
108
109 // Validate the candidate outside the mutex.
! 110 bool reuse_candidate = false;
111 try {
112 if (token_factory && !token_factory.is_none() &&
113 candidate->isTokenNearExpiry(TOKEN_EXPIRY_THRESHOLD_SECS)) {
114 // Expiry-aware checkout with token compare: the pooled token isLines 123-132 123 py::dict fresh_attrs =
124 Connection::invokeTokenFactory(token_factory, fresh_expiry);
125 std::string fresh_token = extractAccessToken(fresh_attrs);
126 if (!fresh_token.empty() && fresh_token == candidate->currentAccessToken()) {
! 127 candidate->setTokenExpiry(fresh_expiry);
! 128 reuse_candidate = candidate->isAlive() && candidate->reset();
129 } else {
130 // Token rotated — remember the fresh token to reopen with,
131 // and eagerly drain the sibling idle connections that still
132 // hold the now-stale token. They were all minted from theLines 139-160 139 pending_expiry = fresh_expiry;
140 have_pending_token = true;
141 const std::string stale_token = candidate->currentAccessToken();
142 if (!stale_token.empty()) {
! 143 std::lock_guard<std::mutex> lock(_mutex);
! 144 _pool.erase(
! 145 std::remove_if(
! 146 _pool.begin(), _pool.end(),
! 147 [&](const std::shared_ptr<Connection>& sibling) {
! 148 if (sibling->currentAccessToken() == stale_token) {
! 149 to_disconnect.push_back(sibling);
! 150 if (_current_size > 0) --_current_size;
! 151 return true;
! 152 }
! 153 return false;
! 154 }),
! 155 _pool.end());
! 156 }
157 }
158 } else {
159 reuse_candidate = candidate->isAlive() && candidate->reset();
160 }Lines 263-283 263 --_current_size;
264 }
265 }
266
! 267 bool ConnectionPool::canEvict() {
! 268 std::lock_guard<std::mutex> lock(_mutex);
269 // Never evict while any connection is checked out or in-flight. Reserved
270 // capacity (_current_size) beyond what is sitting idle in _pool means a
271 // caller still holds one, so the pool must stay.
! 272 size_t checked_out = (_current_size > _pool.size()) ? (_current_size - _pool.size()) : 0;
! 273 if (checked_out > 0) {
! 274 return false;
! 275 }
276 // Nothing checked out and the pool is empty: safe to drop immediately.
! 277 if (_pool.empty()) {
! 278 return true;
! 279 }
280 // Nothing checked out but idle connections remain. Evict the whole pool
281 // only once EVERY pooled connection has been idle longer than the idle
282 // timeout. This is what reclaims pools for rotating / single-use identities
283 // (e.g. per-request Entra users keyed by token hash): such a pool is neverLines 283-300 283 // (e.g. per-request Entra users keyed by token hash): such a pool is never
284 // acquired again, so its idle connection is never pruned by acquire() and
285 // _current_size would otherwise stay > 0 forever. Evaluating the idle
286 // timeout here lets the next acquireConnection() on any key sweep it away.
! 287 auto now = std::chrono::steady_clock::now();
! 288 for (const auto& conn : _pool) {
! 289 auto idle_time =
! 290 std::chrono::duration_cast<std::chrono::seconds>(now - conn->lastUsed()).count();
! 291 if (idle_time <= _idle_timeout_secs) {
! 292 return false;
! 293 }
! 294 }
! 295 return true;
! 296 }
297
298 void ConnectionPool::close() {
299 std::vector<std::shared_ptr<Connection>> to_close;
300 {Lines 345-362 345 // lookup, keeping the hot path cheap under a many-identity connect load.
346 auto now = std::chrono::steady_clock::now();
347 auto sweep_interval = std::chrono::seconds(std::max(1, _default_idle_secs));
348 if (now - _last_sweep >= sweep_interval) {
! 349 _last_sweep = now;
! 350 for (auto it = _pools.begin(); it != _pools.end();) {
! 351 if (it->first != key && it->second && it->second->canEvict()) {
! 352 evicted.push_back(it->second);
! 353 it = _pools.erase(it);
! 354 } else {
! 355 ++it;
! 356 }
! 357 }
! 358 }
359 auto& pool_ref = _pools[key];
360 if (!pool_ref) {
361 LOG("Creating new connection pool");
362 pool_ref = std::make_shared<ConnectionPool>(_default_max_size, _default_idle_secs);Lines 366-379 366 // Close evicted pools outside _manager_mutex: close() disconnects ODBC
367 // handles (releasing the GIL), which must never run while holding
368 // _manager_mutex or we risk a mutex/GIL lock-ordering deadlock.
369 for (auto& evicted_pool : evicted) {
! 370 try {
! 371 evicted_pool->close();
! 372 } catch (const std::exception& ex) {
! 373 LOG("ConnectionPoolManager: closing evicted pool failed: %s", ex.what());
! 374 }
! 375 }
376 // Call acquire() outside _manager_mutex. acquire() may release the GIL
377 // during the ODBC connect call; holding _manager_mutex across that would
378 // create a mutex/GIL lock-ordering deadlock. connStr (not key) is used to
379 // establish new physical connections.Lines 403-412 403 if (conn) {
404 try {
405 conn->disconnect();
406 } catch (const std::exception& ex) {
! 407 LOG("ConnectionPoolManager::returnConnection: disconnect of orphaned "
! 408 "connection failed: %s",
409 ex.what());
410 }
411 }
412 }Lines 434-444 434 _pools.clear();
435 // Nothing left to sweep; reset the throttle so a fresh pool set after this
436 // is swept on its next acquireConnection().
437 _last_sweep = std::chrono::steady_clock::time_point{};
! 438 }
! 439
! 440 size_t ConnectionPoolManager::poolCount() {
! 441 std::lock_guard<std::mutex> lock(_manager_mutex);
442 return _pools.size();
443 }📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.9%
mssql_python.pybind.connection.connection_pool.cpp: 68.8%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 76.4%
mssql_python.__init__.py: 77.3%
mssql_python.row.py: 77.6%
mssql_python.pybind.connection.connection.cpp: 79.1%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.logging.py: 85.5%🔗 Quick Links
|
Pooled Entra connections were keyed only on the sanitized connection string, so two callers hitting the same server with different Entra identities could reuse each other's authenticated connection (cross-identity reuse, #651). connection.py now computes an identity-aware pool key (connStr + NUL + security-context) whenever an access token is present: system/user-assigned MSI keyed on client_id, other token auth keyed on a SHA-256 of the token struct. Non-token auth (SQL, trusted, native ServicePrincipal, Windows Interactive) keeps the legacy bare-connStr key, so behavior is unchanged. Native pool threads an optional pool_key through ConnectionPoolManager::acquireConnection/returnConnection and the pybind Connection init (defaulted, backward compatible); the physical connect still uses the connection string. Adds 6 TestConnectionPoolKey unit tests.
…#659) For MSI auth the identity (client id) is known without a token, so the pool key can be computed up front. Acquisition of the access token is deferred to an internal token_factory callback that the native layer invokes only when it opens a physical connection. A same-identity pool hit therefore reuses the pooled connection and never pays for a token. Token-dependent identities (DefaultAzureCredential / raw token / interactive / device-code) still acquire eagerly because the pool key is the token hash. The internal token_factory is intentionally distinct from the public token_provider= credential API proposed in PR #603 (issue #577).
Add unit tests for auth.get_auth_token_info Windows/interactive and error paths, and is_token_near_expiry wall-clock default. Add native integration tests driving ddbc_bindings.Connection with a token_factory to exercise the C++ lazy-token branches (pool miss vs reuse, non-pooled with/without factory).
… isolation test - Correct SP claims: ServicePrincipal is ODBC-native (creds retained in connStr), so it is already isolated and needs no dedicated key; fix design doc s3.1/s4/s12. - Add an 'Implementation status (PR #660)' section clarifying wired vs deferred (home_account_id, expiry-aware checkout, returnConnection-on-missing, eviction). - Clarify get_token_info docstring: expiry capture is foundation, not yet consumed on checkout. - Add native integration test proving distinct pool keys do not share connections (and that the \x00 key separator survives str->u16string).
…re pooling Remove unreachable acquire-timeout/condition_variable path from the native connection pool, drop the unused is_token_near_expiry helper and orphaned get_auth_token, and delete their now-redundant tests. Clean up code comments.
…pools auth: try get_token() silently first for interactive/device-code credentials and only call authenticate() when the SDK raises AuthenticationRequiredError; cache the resolved home_account_id so subsequent acquisitions stay silent. pool: compare a freshly minted token against a near-expiry connection's current token so a matching token reuses the connection; make canEvict() evaluate the idle timeout so pools holding only idle connections are reclaimed, and close evicted pools outside the manager mutex. tests: add silent-first regression/branch tests (auth) and a real-DB idle identity-pool eviction test (pooling).
pool: throttle the lazy-eviction sweep to at most once per idle-timeout window instead of running on every acquireConnection, cutting global-lock contention under many-identity workloads; eagerly drain sibling idle connections holding a rotated (stale) token in one pass; consolidate the SQL_COPT_SS_ACCESS_TOKEN attribute id into connection.h as a single source of truth. docs: document in README that bring-your-own token lifetime is the application's responsibility, that DefaultAzureCredential is not recommended for multi-user pooling, and that interactive/device-code auth shares one signed-in account per process.
Clear _last_sweep in configure() and closePools() so the first acquireConnection() after a pooling reconfigure or shutdown performs a fresh sweep instead of honoring a stale throttle interval.
There was a problem hiding this comment.
Pull request overview
This PR hardens and improves connection pooling for Entra ID (Azure AD) authentication by making the native pool key identity-aware (preventing cross-identity reuse of authenticated connections), deferring token acquisition until a physical connection is actually opened, and adding token-expiry awareness plus lazy eviction of idle identity pools.
Changes:
- Add
TokenInfo(token struct + expiry + optional account id),get_auth_token_info(), andcompute_identity_key()to support identity-aware pooling and expiry-aware behavior. - Extend the Python
Connectionlayer to compute an identity-awarepool_keyand optionally pass a lazytoken_factorycallback down to the native layer. - Update native pooling to key by
pool_key, invoketoken_factoryonly on pool misses, refresh/reopen near-expiry token connections, and lazily evict idle identity pools; add integration/diagnostic tests and document limitations.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_009_pooling.py | Adds integration/regression tests for idle identity-pool eviction and native lazy token-factory behavior. |
| tests/test_008_auth.py | Adds unit coverage for silent-first interactive/device-code flow, token-expiry capture, identity key computation, and pool-key wiring. |
| README.md | Documents multi-user caveats for interactive/device-code, BYO token lifetime responsibility, and DefaultAzureCredential pooling guidance. |
| mssql_python/pybind/ddbc_bindings.cpp | Extends the pybind Connection constructor to accept pool_key and token_factory, and exposes _pool_count() for tests. |
| mssql_python/pybind/connection/connection.h | Introduces token expiry tracking APIs and a shared SQL_COPT_SS_ACCESS_TOKEN constant used by pooling/connection code. |
| mssql_python/pybind/connection/connection.cpp | Implements token-factory invocation, token-expiry recording, and routes pooling acquire/return via identity-aware keys. |
| mssql_python/pybind/connection/connection_pool.h | Extends pool acquire/manager APIs to take pool_key and token_factory; adds canEvict()/poolCount(). |
| mssql_python/pybind/connection/connection_pool.cpp | Implements lazy token acquisition, expiry-aware checkout (token compare + reopen), and throttled idle identity-pool eviction. |
| mssql_python/connection.py | Builds identity-aware pool keys and defers token acquisition via _token_factory where possible; forwards both to native. |
| mssql_python/auth.py | Adds TokenInfo, silent-first interactive/device-code acquisition with cached home_account_id, DefaultAzureCredential warning, and identity key computation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Refresh threshold for expiry-aware checkout: a pooled connection whose access | ||
| // token expires within this many seconds is discarded and reopened with a fresh | ||
| // token rather than handed out (<=10 min). | ||
| static constexpr int TOKEN_EXPIRY_THRESHOLD_SECS = 300; |
| def test_same_token_produces_stable_key(self): | ||
| k1 = compute_identity_key("default", token_struct=b"same") | ||
| k2 = compute_identity_key("default", token_struct=b"same") | ||
| assert k1 == k2 | ||
|
|
||
| def test_none_when_token_required_but_absent(self): | ||
| # default/interactive/devicecode need the token to form the key; | ||
| # without it, signal the caller to acquire then recompute. | ||
| assert compute_identity_key("default") is None | ||
| assert compute_identity_key("interactive") is None | ||
| assert compute_identity_key("devicecode") is None |
Work Item / Issue Reference
Summary
Connection pooling currently keys only on the sanitized connection string. When two callers target the same server with different Entra (Azure AD) identities, they collide onto the same pool and can be handed a connection authenticated as someone else. This PR makes pooling identity-aware so connections are isolated per identity, and adds lazy (deferred) token acquisition so a token is minted only when a physical connection is actually opened — never on a pool hit.
It also handles the token lifecycle inside the pool: near-expiry connections are refreshed on checkout, and pools belonging to identities that are no longer active are reclaimed.
The problem
The approach
1. Identity-aware pool key. The pool key becomes
connStrfor non-token auth (unchanged), orconnStr + "\x00" + <security-context>when a token is involved.compute_identity_key()(auth.py) derives the discriminator per auth type:connStrmsi:<client_id>/msi:systemacct:<home_account_id>tok:<sha256(token_struct)>2. Lazy token acquisition (#659).
connection.pybuilds a_token_factoryclosure and passes it to the native layer. The factory is invoked only when a physical connection is opened (pool miss / non-pooled). Same-identity pool hits acquire no token.3. Silent-first interactive auth. For interactive / device-code,
_acquire_tokenattempts a silentget_token()from the cached account first and only falls back to an interactiveauthenticate()prompt when the SDK raisesAuthenticationRequiredError. The resolvedhome_account_idis cached so subsequent acquisitions stay silent — pooled reconnects refresh without re-prompting.4. Expiry-aware checkout. Token expiry is captured (
TokenInfo,setTokenExpiry) and checked on checkout. When a pooled connection is within the threshold (<= 5 min), the factory mints a fresh token and compares it to the one the connection holds: same token ⇒ refresh recorded expiry and reuse; rotated token ⇒ discard and reopen with the freshly minted token (no second factory call). Sibling connections holding the now-stale token are drained in the same pass.5. Lazy eviction of idle identity pools. Single-use identities would otherwise leave their pools alive forever.
canEvict()now evaluates the idle timeout, andacquireConnectionsweeps aged-out pools. The sweep is throttled to once per idle-timeout window to keep the hot connect path cheap under many-identity workloads, and evicted pools are closed outside the manager mutex to avoid a mutex/GIL deadlock.Backward compatibility
(attrs, expires_on)tuple contract and the legacy bare-dict contract.Documented limitations (README)
token_provider/ raw token): token lifetime is the application's responsibility; the driver's expiry-aware checkout only applies to tokens it mints itself.DefaultAzureCredentialis not recommended for multi-user pooling (it resolves to a single ambient identity); a one-time warning is emitted.Testing
tests/test_008_auth.py(unit, mocked azure-identity): identity-key computation for every auth type, silent-first regression/branch coverage (silent success never prompts;AuthenticationRequiredErrortriggersauthenticate()exactly once; subsequent acquisitions stay silent), token-expiry capture, lazy token-factory edge cases, cross-identity isolation.tests/test_009_pooling.py(integration, live DB — skips without one): native lazy token factory, and a regression test proving an idle identity pool is evicted by a later acquire on a different key.Out of scope / follow-ups