Redundant MetricsCapture in trace_call produces orphan metrics with incomplete resource labels

#### Environment details

- OS type and version: macOS / Linux
- Python version: 3.13
- `google-cloud-spanner` version: 3.63.0 (current `main`)

#### Description

Every Spanner operation that goes through `trace_call()` produces orphan OpenTelemetry metric data points with incomplete resource labels (missing `project_id` and `instance_id`). These orphan data points persist for the process lifetime due to cumulative aggregation and are re-exported to Cloud Monitoring every 60 seconds, which rejects them with:

```
INVALID_ARGUMENT: One or more TimeSeries could not be written:
timeSeries[...]: the set of resource labels is incomplete, missing (instance_id)
```

#### Root cause

`trace_call()` in `_opentelemetry_tracing.py` wraps every operation with a bare `MetricsCapture()` (no `resource_info`). Meanwhile, every caller of `trace_call` already provides its own `MetricsCapture(self._resource_info)` with correct labels.

When Python evaluates `with trace_call(...) as span, MetricsCapture(self._resource_info):`, two separate `MetricsTracer` instances are created:

1. **tracer_A** (from `trace_call`'s internal `MetricsCapture()`): has `instance_config`, `location`, `client_hash`, `client_uid`, `client_name` from the factory, but **never** receives `project_id` or `instance_id`
2. **tracer_B** (from the caller's `MetricsCapture(resource_info)`): has correct labels, overwrites tracer_A in the context var

On exit, tracer_B records correct metrics first, then tracer_A records metrics with incomplete labels. Since the `SpannerMetricsTracerFactory` never has `project_id`/`instance_id` in its `_client_attributes` (only set per-tracer via `resource_info` or `MetricsInterceptor`), tracer_A always starts without them and is never populated because the `MetricsInterceptor` only touches the current context-var tracer (tracer_B).

With OpenTelemetry's cumulative aggregation, once these orphan aggregation buckets are created, they persist for the process lifetime and are re-exported every 60 seconds.

#### History

- PR #1302 introduced the metrics system. All `MetricsCapture()` instances were bare, including the one in `trace_call`. The design relied on `MetricsInterceptor` to populate labels during gRPC calls.
- PR #1509 added the `_resource_info` property and changed all caller sites from `MetricsCapture()` to `MetricsCapture(self._resource_info)` for eager label propagation. However, the bare `MetricsCapture()` inside `trace_call` was not removed, making it redundant and harmful.

#### Impact

- Affects every Spanner operation (~27 code paths) on every invocation
- Creates persistent orphan metric aggregation buckets
- Produces repeated `INVALID_ARGUMENT` error logs every 60 seconds
- Wastes CPU/network on exporting invalid TimeSeries
- Application functionality is unaffected; valid metrics from the caller's `MetricsCapture` still work

#### Steps to reproduce

1. Create a `spanner.Client()` with metrics enabled (default)
2. Perform any Spanner operation (e.g., `session.create()`, `snapshot.execute_sql()`)
3. Observe `INVALID_ARGUMENT` errors logged from the metrics exporter every 60 seconds

#### Suggested fix

Remove the bare `MetricsCapture()` from `trace_call` — it is redundant since every caller already provides its own. See PR #1522.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant MetricsCapture in trace_call produces orphan metrics with incomplete resource labels #1523

Environment details

Description

Root cause

History

Impact

Steps to reproduce

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redundant MetricsCapture in trace_call produces orphan metrics with incomplete resource labels #1523

Description

Environment details

Description

Root cause

History

Impact

Steps to reproduce

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions