feat(storage): implement App-centric Observability (ACO) OpenTelemetry tracing#17149
Draft
chandra-siri wants to merge 18 commits into
Draft
feat(storage): implement App-centric Observability (ACO) OpenTelemetry tracing#17149chandra-siri wants to merge 18 commits into
chandra-siri wants to merge 18 commits into
Conversation
56e225f to
cab26e6
Compare
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces an in-memory LRU cache for GCS bucket metadata to support App-centric Observability (ACO) within OpenTelemetry tracing. Key changes include the implementation of a thread-safe BucketMetadataCache with background fetching, updates to the Client class to support context management and cache lifecycle, and the re-enabling of system tests in Cloud Build. Feedback identifies a potential StopIteration error in the cache logic if max_size is zero and suggests pre-compiling regular expressions in the tracing module to improve performance.
5544ee4 to
d037656
Compare
…y tracing and lockless bucket metadata caching
d037656 to
d0f4980
Compare
…and debug logging on exceptions
…n behind OpenTelemetry enablement Ensure that BucketMetadataCache lookups and asynchronous cache eviction threads only execute when OpenTelemetry tracing is installed and enabled. Also safely access _extra_headers on Client objects. TAG=agy CONV=c671fa00-7189-45b9-a5af-12f4c7a7c486
TAG=agy CONV=c671fa00-7189-45b9-a5af-12f4c7a7c486
…evel and update Client spans
…sts by using cache.get
…oiding background metadata fetches on reload and exists
…_bucket, lookup_bucket, and create_bucket
…ucket.delete and add unit tests
…ack, and smart eviction system tests
… in system tests, add get method in cache
…ation, split eviction tests, fail on OTel missing
…, add dedicated synchronous cache warming tests
…kup_bucket, add unit tests
…ket.reload, add unit tests and pickle check script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(storage): implement App-centric Observability (ACO) OpenTelemetry tracing
This PR implements App-centric Observability (ACO) tracing compatibility for the GCS Python SDK (
google-cloud-storage). All OpenTelemetry trace spans produced by bucket and blob operations now seamlessly incorporate mandatory destination resource annotations (gcp.resource.destination.idandgcp.resource.destination.location).Core Architecture & Design
1. Centralized, DRY Telemetry Helper (
_helpers.py)create_trace_span_helperin_helpers.py._opentelemetry_tracing.pyremains completely pristine and identical tomain.blob.py,bucket.py, andclient.py(e.g.,download_as_bytes,upload_from_string,get_bucket,lookup_bucket, etc.).2. Bounded LRU Metadata Cache (
_lru_cache.py,_bucket_metadata_cache.py)LRUCacheutilizing anOrderedDictto support O(1) operations and strict capacity bounding to eliminate memory leaks in long-running applications.BucketMetadataCacheto store bucket locations and project numbers. On cache misses, it spawns background threads (_fetch_background) using singleflight tracking (_inflight_fetches) to prevent server stampedes / thundering herds.403 Forbiddenpermissions errors, the cache permanently registers fallback annotations (projects/_/buckets/{name}) to completely avoid retry storms on subsequent API calls.3. Resilient 404 Existence Eviction (
_http.py,_helpers.py,bucket.py)404 NotFounderror occurs during media transfers or REST calls, a background thread is spawned (with concurrency protection via_inflight_checks) to check if the bucket was deleted out-of-band (bucket.exists()). Ifexists()returnsFalse, the bucket is cleanly evicted from the cache.Bucket.delete()calls synchronously and instantly evict the bucket name from the cache, ensuring real-time consistency.Extensive Testing Suite
1. 100% Sleep-Free System Tests (
test_aco_observability.py)Added a comprehensive system test suite
test_aco_observability.pyexecuting against a live GCS backend:threading.Event(zero static sleeps) for thread coordination, guaranteeing thundering-fast execution and completely eliminating timing flakiness.2. Robust Unit Tests
test__lru_cache.py(LRU correctness, bounding, eviction).test__bucket_metadata_cache.py(concurrency, location resolution, 403 fallback, singleflight).test_delete_hit_evicts_from_cacheinsidetest_bucket.py.Validation Results
All checks, unit tests, and live GCS system tests pass flawlessly:
black/flake8)