Skip to content

tests/cloud: migrate ducktape cloud tests from cloud-api to public-api v1#29764

Open
sago2k8 wants to merge 1 commit intodevfrom
cloud-tests-migrate-to-public-api-v1
Open

tests/cloud: migrate ducktape cloud tests from cloud-api to public-api v1#29764
sago2k8 wants to merge 1 commit intodevfrom
cloud-tests-migrate-to-public-api-v1

Conversation

@sago2k8
Copy link
Copy Markdown
Contributor

@sago2k8 sago2k8 commented Mar 6, 2026

Summary

  • Migrate cluster CRUD, networks, resource groups, operations, and network peerings from legacy cloud-api (/api/v1/) and v1beta2 (/v1beta2/) endpoints to public-api v1 (/v1/) endpoints
  • Update rpcloud_client.py endpoint definitions and HTTP methods to route through public_api_url
  • Update redpanda_cloud.py cluster creation/deletion, operation polling, region lookup, and peering flows to use v1 endpoints
  • Update rp_cloud_cleanup.py to handle v1 response format (type/state enum normalization, field name changes)
  • Preserve legacy endpoints only where no v1 equivalent exists: tiers (/v1beta2/tiers), install pack versions, and prometheus credentials

Endpoints migrated

From To
POST /v1beta2/resource-groups POST /v1/resource-groups
DELETE /v1beta2/resource-groups/{id} DELETE /v1/resource-groups/{id}
GET /v1beta2/clusters, GET /v1beta2/clusters/{id} GET /v1/clusters, GET /v1/clusters/{id}
POST /v1beta2/clusters POST /v1/clusters
POST /v1beta2/networks POST /v1/networks
GET /v1beta2/operations/{id} GET /v1/operations/{id}
GET /api/v1/clusters/{id} (most uses) GET /v1/clusters/{id}
DELETE /api/v1/clusters/{id} DELETE /v1/clusters/{id}
GET /api/v1/networks/{id}, GET /api/v1/networks GET /v1/networks/{id}, GET /v1/networks
POST /api/v1/networks/{id}/network-peerings POST /v1/networks/{id}/network-peerings
GET /api/v1/networks/{id}/network-peerings GET /v1/networks/{id}/network-peerings
POST ListRegions (ConnectRPC) GET /v1/regions/{cloud_provider}

Intentionally unchanged

  • /v1beta2/tiers — no v1 equivalent yet
  • /api/v1/clusters-resources/install-pack-versions — no v1 equivalent
  • /api/v1/clusters/{id} for install pack version and prometheus credentials — no v1 equivalent
  • Console API calls (/api/users, /api/acls) — out of scope

All remaining legacy/beta references are marked with TODO: DEVPROD-2525 comments.

Test plan

  • Run FMC cluster creation/deletion lifecycle test
  • Run BYOC cluster creation/deletion lifecycle test
  • Run private-network cluster test with VPC peering flow
  • Run rp_cloud_cleanup.py against test environment
  • Grep for remaining /api/v1/ and /v1beta2/ — verify all are intentional

…i v1

Migrate cluster lifecycle operations (create, get, delete, list),
networks, resource groups, operations, and peerings from legacy
cloud-api (/api/v1/) and v1beta2 (/v1beta2/) endpoints to public-api
v1 (/v1/) endpoints.

Endpoints that remain on legacy/beta due to no v1 equivalent:
- /v1beta2/tiers (no v1 tiers endpoint yet)
- /api/v1/clusters-resources/install-pack-versions (internal only)
- /api/v1/clusters/{id} for install pack version and prometheus creds
- /api/v1/users (console API, out of scope)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 14:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates Redpanda Cloud ducktape test utilities from legacy cloud-api (/api/v1) and v1beta2 (/v1beta2) endpoints to public-api v1 (/v1), updating request/response shapes and routing to public_api_url.

Changes:

  • Updated cluster/network/resource-group/operations/peering calls to use /v1/... endpoints and public_api_url.
  • Adjusted request/response wrapping for v1 (e.g. {"network": ...}, {"cluster": ...}) and normalized some v1 enum/field differences in cleanup.
  • Preserved legacy endpoints where v1 equivalents don’t exist (install pack versions, prometheus credentials, tiers) via TODOs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
tests/rptest/services/redpanda_cloud.py Migrates cluster/network/ops/peering flows to v1 endpoints and adapts payload/response handling.
tests/rptest/services/provider_clients/rpcloud_client.py Updates endpoint builders and client helpers to target public-api v1 and unwrap v1 response envelopes.
tests/rp_cloud_cleanup.py Normalizes v1 response fields/enums for cleanup and updates peering handle construction.

Comment on lines +267 to 268
return _network.get("network", _network)

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_network() now targets /v1/networks and returns _network.get("network", _network). There are in-repo consumers (e.g. CloudCluster._prepare_fmc_network_vpc_info and cloudv2_object_store_blocked) that index into the legacy-looking network shape (status.created.providerNetworkDetails... and camelCase keys like vpcId/cidrBlock). Since this PR also moves other resources to a v1/snake_case schema (e.g. cidr_block, resource_group_id), those consumers are at high risk of breaking unless the v1 network response still contains the legacy fields. Consider either updating those call sites to the v1 schema, or making get_network() normalize/abstract the network response so callers aren’t tied to a specific version’s field naming.

Suggested change
return _network.get("network", _network)
# Normalize the network response so callers can rely on both legacy
# and v1 field naming / structure.
raw_network = _network.get("network", _network)
# If the payload is not a dict, just return it as-is.
if not isinstance(raw_network, dict):
return raw_network
# Work on a shallow copy so we don't accidentally mutate shared state.
network = dict(raw_network)
# Derive legacy/camelCase providerNetworkDetails from v1/snake_case fields
# when possible. This keeps existing callers that expect
# status.created.providerNetworkDetails.vpcId/cidrBlock working even when
# the backend returns v1/snake_case fields (e.g. vpc_id, cidr_block).
vpc_id = network.get("vpc_id") or network.get("vpcId")
cidr_block = network.get("cidr_block") or network.get("cidrBlock")
if vpc_id is not None or cidr_block is not None:
# Only synthesize the status structure if it is missing or not a dict,
# to avoid clobbering a fully-formed legacy response.
if "status" not in network or not isinstance(network.get("status"), dict):
network["status"] = {}
status = network["status"]
created = status.setdefault("created", {})
provider_details = created.setdefault("providerNetworkDetails", {})
if vpc_id is not None:
provider_details["vpcId"] = vpc_id
if cidr_block is not None:
provider_details["cidrBlock"] = cidr_block
return network

Copilot uses AI. Check for mistakes.
endpoint=self.current.network_endpoint,
base_url=self.config.public_api_url,
)
self.vpc_peering = _peerings.get("network_peerings", [_peerings])[0]
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_peerings may be a list (legacy endpoint returned a list and _http_get() can return any JSON type). Calling .get(...) on a list will raise AttributeError. Consider handling both response shapes here (dict with network_peerings vs a raw list) before indexing [0].

Suggested change
self.vpc_peering = _peerings.get("network_peerings", [_peerings])[0]
# Handle both legacy (list) and current (dict) response shapes
if isinstance(_peerings, dict):
network_peerings = _peerings.get("network_peerings")
if isinstance(network_peerings, list) and network_peerings:
self.vpc_peering = network_peerings[0]
elif network_peerings:
# Single peering object instead of a list
self.vpc_peering = network_peerings
else:
raise RuntimeError(
f"No network peerings found in response: {_peerings!r}"
)
elif isinstance(_peerings, list) and _peerings:
# Legacy API returns a raw list of peerings
self.vpc_peering = _peerings[0]
else:
raise RuntimeError(
f"Unexpected network peering response shape: {_peerings!r}"
)

Copilot uses AI. Check for mistakes.
endpoint=self.current.network_endpoint,
base_url=self.config.public_api_url,
)
self.vpc_peering = _peerings.get("network_peerings", [_peerings])[0]
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_peerings may be a list (legacy endpoint returned a list and _http_get() can return any JSON type). Calling .get(...) on a list will raise AttributeError. Consider handling both response shapes here (dict with network_peerings vs a raw list) before indexing [0].

Suggested change
self.vpc_peering = _peerings.get("network_peerings", [_peerings])[0]
if isinstance(_peerings, list):
peerings_list = _peerings
else:
peerings_list = _peerings.get("network_peerings", [_peerings])
self.vpc_peering = peerings_list[0]

Copilot uses AI. Check for mistakes.
@@ -749,9 +751,9 @@ def _create_new_cluster(self):
self._logger.warning(f'creating network name "{self.current.name}-network"')
# Prepare network payload block
_body = self._create_network_payload()
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network create request now posts json={"network": _body}. _create_network_payload() sets _net = (self.config.network,), so for private networking cidr_block becomes a tuple that will JSON-encode to a one-element array rather than a CIDR string. If the v1 API expects cidr_block to be a string (as in the public-network case), this will break private-network cluster creation; consider making _net a plain string and ensuring cidr_block is always a string.

Suggested change
_body = self._create_network_payload()
_body = self._create_network_payload()
# Ensure cidr_block is always a string (not a tuple) before sending to the v1 API
cidr_block = _body.get("cidr_block")
if isinstance(cidr_block, tuple) and len(cidr_block) == 1:
_body["cidr_block"] = cidr_block[0]

Copilot uses AI. Check for mistakes.
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Retry command for Build#81457

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/random_node_operations_smoke_test.py::RedpandaNodeOperationsSmokeTest.test_node_ops_smoke_test@{"cloud_storage_type":1,"mixed_versions":false}

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#81457
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/81457#019cc389-4562-4281-af1c-7e9bdf133f32 FLAKY 24/31 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0438, p0=0.0017, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/81457#019cc386-6f14-4295-8eef-45bdb3ec4b35 FLAKY 9/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0893, p0=0.6075, reject_threshold=0.0100. adj_baseline=0.2447, p1=0.2563, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants