Skip to content

Showcase: automated template compatibility testing + AI debugging workflow alignment#280

Draft
kalanyuz wants to merge 263 commits into
cre-reliability-load-testfrom
experimental/agent-skills
Draft

Showcase: automated template compatibility testing + AI debugging workflow alignment#280
kalanyuz wants to merge 263 commits into
cre-reliability-load-testfrom
experimental/agent-skills

Conversation

@kalanyuz
Copy link
Copy Markdown

@kalanyuz kalanyuz commented Feb 23, 2026

PR: Showcase automated template compatibility testing and AI debugging workflow alignment

Summary

Brief overview of what this PR implements:

  1. Template Compatibility Guardrail: Adds a unified compatibility suite for template IDs 1..5 (including IDs 3 and 5) with a drift canary.
  2. Process and Operational Hardening: Adds CI path-gated enforcement, documents embedded-vs-dynamic source modes, and codifies repeatable QA/template/TUI agent workflows.
flowchart LR
    A[Template or creinit change] --> B[Path Filter Job]
    B --> C[TestTemplateCompatibility]
    C --> D[PR Signal]
    D --> E[QA + Skills + Docs Alignment]
Loading

Changes

1. Template Compatibility Automation

File: test/template_compatibility_test.go

What changed:

  • Added TestTemplateCompatibility table-driven suite for template IDs 1..5.
  • Added explicit cases for TS_HelloWorld_Template3 and TS_ConfHTTP_Template5.
  • Added drift canary TestTemplateCompatibility_AllTemplatesCovered.

Result: Template init/build/simulate regressions are caught deterministically and coverage drift is flagged early.

2. CI Enforcement Path

File: .github/workflows/pull-request-main.yml

What changed:

  • Added template-compat-path-filter.
  • Added ci-test-template-compat on Linux + Windows.
  • Runs go test -v -timeout 20m -run TestTemplateCompatibility ./test/ when template-impacting paths change.

Result: Compatibility checks become a targeted PR guardrail instead of ad-hoc local validation.

3. Template Repository Operational Setup

Files:

  • submodules.yaml
  • scripts/setup-submodules.sh
  • Makefile
  • .gitignore

What changed:

  • Added clone/update/clean workflow for external cre-templates reference checkout.
  • Added make targets and managed gitignore behavior.

Result: External template workspace setup is reproducible without using Git submodules.

4. Agent and Framework Alignment

Files:

  • AGENTS.md
  • .claude/skills/cre-add-template/**
  • .claude/skills/cre-qa-runner/**
  • .claude/skills/cre-cli-tui-testing/**
  • .claude/skills/using-cre-cli/**
  • testing-framework/*.md
  • .claude/skills/skill-audit-report.md

What changed:

  • Documented current embedded template source and branch-gated upcoming dynamic pull mode.
  • Updated skills with clearer invocation boundaries and dynamic-mode handling guidance.
  • Aligned testing-framework docs to dual-source model and advisory-first dynamic validation rollout.

Result: Consistent operator + agent guidance across docs, skills, and CI expectations.

Impact

Modified (selected files)

  • test/template_compatibility_test.go - New unified compatibility suite + canary.
  • .github/workflows/pull-request-main.yml - Path filter + compatibility job.
  • AGENTS.md - Source-mode architecture and maintenance workflow.
  • submodules.yaml - External template repo relationship config.
  • scripts/setup-submodules.sh - Deterministic clone/update/clean script.
  • .claude/skills/cre-add-template/SKILL.md - Template workflow and checklist boundaries.
  • .claude/skills/cre-qa-runner/SKILL.md - QA runbook/reporting + source provenance guidance.
  • testing-framework/README.md - Embedded baseline + branch-gated dynamic framing.

Affected (no runtime behavior change)

  • cmd/creinit/* runtime flow remains embedded-template baseline.
  • ✅ Dynamic-template behavior is documented as branch-gated, not active default behavior.

Testing

Compatibility Suite

  • go test -v -timeout 20m -run TestTemplateCompatibility ./test/
  • ✅ Includes IDs 1,2,3,4,5 and canary coverage guard.

Skill/Docs Consistency Checks

  • ✅ Skill description overlap scan (rg "^description:" .claude/skills/*/SKILL.md).
  • ✅ Cross-doc alignment pass across AGENTS.md, testing-framework/*.md, and updated skill docs.

Deployment Notes

Pre-Deployment Requirements

  • No production deployment steps required.
  • CI workflow changes require standard Actions permissions already present in repo workflow.

Deployment Order

  1. Merge PR.
  2. Let PR workflow enforce template-compat job on relevant path changes.
  3. Use updated skills/runbooks for subsequent template additions and pre-release QA.

Backward Compatibility

  • Backward compatible.
  • No CLI API/runtime behavior changes introduced by this PR.

Verification Checklist

  • Unified compatibility test covers template IDs 1..5.
  • Canary guard exists for registry/test-table drift.
  • CI path filter + compatibility job added.
  • Skills updated for invocation clarity and dynamic-mode branch gating.
  • Testing framework docs aligned to embedded-now + dynamic-next model.

Rollback Plan

If issues occur:

  1. Revert workflow additions in .github/workflows/pull-request-main.yml.
  2. Revert compatibility test file test/template_compatibility_test.go.
  3. Revert docs/skills updates in AGENTS.md, .claude/skills/**, and testing-framework/*.md.

timothyF95 and others added 30 commits October 29, 2025 20:12
* reorder and enrich

* lint

* cleanup
* reorder and enrich

* lint

* cleanup

* Add flags and decrease sleep
fix: error on refresh token
Reduce sleep duration before exiting the command.
* Update TS templates to use the latest SDK

* Point to beta release
* Validate credentials

* fix tests
* Add workflowID to user event

* clean up logs
Update default ts sdk version to beta.1
* add workflow language to runtimecontext

* lint

* review feedback
refactor: splits each capability dependency version
* fix: remove progress bar in windows installation script

* fix: bash script

* fix: bash script
* update go por README trigger time

* update transaction hash

* update transaction hash in ts template

* update TS template to be the same as GO
* feat: add org name in whoami

* merge into single query
* feat: remove dummy private key from init

* use dummy eth-key if not set for simulation

* force private key for broadcast

* improve error message
ejacquier and others added 30 commits April 16, 2026 20:33
#377)

* Replace getOrganization with getCreOrganizationInfo in CLI auth validation

* Fix test
* Stop sending deprecated onchain related fields for artifacts management

* initial refactor

* improve

* improve sim env

* linter
* initial implementation

* linter

* generate docs
* initial impl

* lint

* implement

* linter

* fix e2e test

* linter
* initial impl

* lint

* implement

* linter

* fix e2e test

* linter

* initial implementation

* refactor

* linter

* linter
* Refactor SimualtedEnvironment to builder pattern

* fix tests
* Resolve workflow owner by deployment registry

* Fix tests

* Review feedback

* Update e2e tests
…activate/pause (#391)

* improve workflow details

* add registry also for onchain deploy

* remove duplicated code
* wrap gql errors

* refactor

* refactor
* prompt to skip waiting

* wip done signal

* done signal and prompt to skip wait

* use skip signal

* fix channel race

* move out of coroutine

* fix non-interactive mode

* simplify lifecycle

* update deps

* linter: replace deprecated method
…#373)

* refactor: make simulator chain-agnostic with pluggable chain family architecture

Introduce ChainFamily interface and package-level registry enabling
future Aptos/Solana chain support. Extract all EVM-specific simulation
code into cmd/workflow/simulate/chain/evm/ package with self-registration
via init().

- Add ChainFamily interface with 7 methods (Name, ResolveClients,
  RegisterCapabilities, ExecuteTrigger, ParseTriggerChainSelector,
  RunHealthCheck, SupportedChains)
- Add thread-safe chain family registry with Register/Get/All/Names
- Move EVM health checks, trigger parsing, supported chains list,
  and limited capabilities into chain/evm package
- Refactor simulate.go to iterate registered families instead of
  hardcoding EVM logic
- Fix EVM chains not being Start()ed after registration
- Remove redundant blank import (named import triggers init())
- Rename test stubs for clarity

* chore: audit refactor for parity and abstraction

- Restore missing debug/error log statements in EVM ResolveClients
- Restore ExecuteTrigger nil guards (f.evmChains, EVMChains[selector])
- Restore ui.Success for EVM log discovery in fetchAndConvertLog
- Restore comments: Use Opaque, runRPCHealthCheck semantics, regex example
- Track experimentalSelectors on EVMFamily, pass to RunHealthCheck
- Factory-pattern family registration with logger injection (chain.Build)
- Extract ResolveKey + ResolveTriggerData to EVM family
- Extract simulator_utils.go contents; WorkflowExecutionTimeout to simulate.go
- RegisterCapabilities returns []services.Service for lifecycle parity
- getTriggerDataForFamily doc comment chain-agnostic
- PROMPT.md + SMOKE_TESTS.md audit artifacts (65 tests)

* test: add EVM chain family unit tests

Covers ResolveKey private-key matrix (valid/invalid/sentinel/empty
across broadcast and non-broadcast), LimitedEVMChain delegation for
every method, trigger hash validation and log-fetch mock scenarios,
supported-chains invariants (unique selectors, valid 20-byte
forwarders), and extended health-check labelling for experimental,
known, and unknown selectors.

100+ new test cases, zero production code changes.

* refactor: decouple TriggerParams from EVM-specific fields

Replace EVMTxHash/EVMEventIndex on TriggerParams with a generic
FamilyInputs string map so each chain family owns its input keys.
Unify EVMChainLimits as a type alias for chain.Limits, removing the
redundant interface and the type assertion in RegisterCapabilities.
Guard against the typed-nil interface trap when boxing SimulationLimits.

* refactor: decouple CLI flags from simulator core into chain family registry

Chain families now own their CLI flag definitions and input collection,
removing all EVM-specific knowledge from simulate.go. Fixes a potential
deadlock in registry.Get() by extracting namesLocked() helper, and
guards ResolveKey to skip families without configured clients.

* refactor: rename ChainWriteEVMGasLimit to ChainWriteGasLimit

Drop EVM prefix from the chain.Limits interface method to better
reflect the generic abstraction layer.

* fix: suppress ui.Success messages on non-interactive trigger path

fetchAndConvertLog now takes a verbose flag. Interactive mode emits
ui.Success messages as before; non-interactive mode stays silent,
matching the original behavior.

* refactor: rename ChainFamily to ChainType throughout simulator

Rename the chain abstraction from "family" to "type" for clearer
domain language. EVM is a chain type, not a chain family.

- ChainFamily interface → ChainType
- EVMFamily struct → EVMChainType
- family.go → chaintype.go (file rename)
- FamilyClients/Forwarders/Keys/Inputs → ChainTypeClients/etc
- getTriggerDataForFamily → getTriggerDataForChainType
- Import alias chaintype → corekeys (conflict avoidance)
- All comments, error strings, test names updated

* fix(lint): resolve golangci-lint errors

- Remove redundant copyloopvar 'tt := tt' (Go 1.22+)
- Remove unused evmCapabilityBaseStub embed in fullStubCapability
- Convert 'switch { case x == a }' to 'switch x { case a }' (staticcheck QF1002)
- Fix goimports formatting

* fix(lint): gofmt whitespace in chain/evm test files

* fix(lint): rename runRPCHealthCheck to avoid confusing-naming; annotate blank import

* fix(lint): inline comment on blank import to satisfy goimports

* refactor: address review feedback on chain family registry

Consolidated review-feedback fixes on top of the chain-family refactor.

Interface cleanup (makes the abstraction live up to its chain-agnostic
name):

- chain.Limits: narrow to ChainWriteReportSizeLimit only; move
  ChainWriteGasLimit onto a new evm.EVMChainLimits sub-interface so
  EVM-specific accessors do not leak into the generic contract.
- chain.ResolvedChains: new struct returned from ResolveClients, carrying
  clients, forwarders, and experimental-selector flags. Replaces the
  hidden cross-method state the EVM implementation was keeping on its
  struct for RunHealthCheck to read.
- ChainType.ResolveClients and ChainType.RunHealthCheck adopt the new
  return/argument shape.

User-visible parity with main:

- Per-family error prefix restored when trigger-data resolution fails,
  using the family name (e.g. 'Failed to get evm trigger data: ...').
- 'No {name} chain initialized for selector %d' uses the registry-key
  name rather than a manually-uppercased variant.

Small cleanups touched on the way:

- Hoist the 'RPC health check failed:' wrap from evm.RunRPCHealthCheck
  to the simulate.go caller so a future second chain family cannot
  produce doubled headers.
- Remove dead ManualTriggers.Close (no callers before or after this PR).
- Rename the EVMChainLimits test stub to stubEVMLimits to stop it
  colliding by one letter with the production EVMChainLimits alias.

* fix: restore disableEngineLimits call site lost in merge

Merge from main pulled in the disableEngineLimits function but the call
site in WorkflowSettingsCfgFn was dropped during conflict resolution,
leaving the function unused. Re-add the `--limits none` branch so engine
limits are actually disabled when requested.

* chore: restore getHTTPTriggerPayload comments to match main

* refactor: rename manualTriggers to triggerCaps to match main

* refactor: rename triggerCaps to manualTriggerCaps for clarity

Distinguish manual trigger capabilities (cron, HTTP) from chain-related
capabilities by using an explicit name at the call sites.

* refactor: drop redundant ChainTypeClients presence check

HasSelector guard upstream and per-selector check inside ResolveTriggerData
make the map-level check unreachable; remove to reduce indirection.

* chore: restore trigger start error message to match main

* refactor: restore service append order + trigger caps error message

- Append manualTriggerCaps cron/HTTP services immediately after creation so
  the shutdown order matches main (cron/HTTP before chain-type services).
- Revert "Failed to create cron/HTTP trigger capabilities" back to
  "Failed to create trigger capabilities" to match main.

* refactor: tighten chain capability wiring

- Rename ManualTriggers receiver t -> m to match main.
- Use corekeys.EVM directly for chain.Register; import in chain/evm package.
- Push string -> common.Address forwarder conversion into
  NewEVMChainCapabilities so it happens only for chains with clients;
  drop intermediate map in RegisterCapabilities.
- Remove Inputs.ChainTypeForwarders; read forwarders via
  ChainTypeResolved[name].Forwarders for a single source of truth.

* style(test): strip section divider comments from evm test files

* test: prune low-signal tests and consolidate test files

Remove 29 net-new tests plus delegation test file (~1100 lines) that
verified tautologies, library behaviour, compile-time invariants, or
one-line pass-throughs. Merge the 'more' and 'validation' split files
back into their single-file counterparts.

Deletions:
- limited_capabilities_delegation_test.go (whole file): verified
  stub.calls[method]==1 on pass-through methods. WriteReport gating
  logic (size+gas) remains covered by limited_capabilities_test.go.
- chaintype_test.go: literal-constant getters, crypto-library echo,
  duplicate ResolveKey cases already in the table, one-line
  delegation, compile-time interface check.
- health_more_test.go: nil/empty-map tautologies and exact duplicates
  of tests in health_test.go.
- supported_chains_test.go: pin tests (NotEmpty, EthereumMainnet,
  Sepolia with hardcoded forwarder), tautologies (Lowercased,
  Are20Bytes redundant with hex regex), compile-time struct check.
- trigger_validation_test.go: regex coverage redundant with parser
  table; misleading-name edge cases dropped; 5 genuinely novel cases
  (zero, max uint64, negative, unicode digits, tab) merged into the
  parser table.
- registry_test.go: map round-trip smokes and happy-path nil
  acceptance; all real branches covered by duplicate-panic, sort,
  defensive-copy, error-with-names, CLI-branching tests.

Consolidations:
- Merged health_more_test.go into health_test.go (11 tests total).
- Merged trigger_validation_test.go into trigger_test.go (8 tests
  plus the expanded parser table).

Kept tests exercise real branches and data invariants: ResolveKey
hex/sentinel logic, RegisterCapabilities type assertions, ExecuteTrigger
error paths, CollectCLIInputs filters, experimental/unknown health
labels, unique selectors, valid hex forwarders, chain-selector
resolution, httptest RPC round-trips in GetEVMTriggerLogFromValues,
and registry duplicate-panic / sort / defensive-copy / CLI branching.

* test: drop misleading ExecuteTrigger_WrongTriggerDataType test

The test claimed to cover the triggerData type-assertion branch in
ExecuteTrigger but its setup (empty EVMChains map) always tripped the
earlier nil-chain check first. The errorContainsAny hedge masked the
issue by accepting either error, making the test effectively a
duplicate of TestEVMChainType_ExecuteTrigger_UnknownSelector.

* test: remove unused captureStdout helper

Only call site was wrapping an empty function in
TestGetEVMTriggerLogFromValues_Success; that dead line was dropped
during the trigger test consolidation, leaving the helper unreferenced.

* style: fix gci alignment in simulate.go Inputs literal

* refactor(simulate/chain): prefix-scope trigger parser + rename HasSelector

Address PR #373 review feedback.

- Move ParseTriggerChainSelector from evm/ to shared chain/ package.
- Anchor regex on the chain-family prefix so each family only claims its
  own trigger IDs. Previously the generic (?i)chainselector:(\d+) regex
  matched any ID containing that substring, causing EVM's parser to
  claim Aptos IDs (and vice-versa) nondeterministically based on map
  iteration order.
- Rename ChainType.HasSelector -> Supports for honest semantics (false
  can mean unsupported, unconfigured, or unknown selector -- all cases
  the caller should fail on).
- Swap parser arg order to (prefix, id) mirroring the trigger ID layout.
- Tighten doc comments on the renamed interface methods.

* fix(simulate/evm): avoid deprecated ecdsa.PrivateKey.D

Replace direct reads of pk.D (deprecated in Go 1.26, SA1019) with a
helper that compares serialized key bytes against the sentinel. Fixes
ci-lint on this branch.

* refactor(simulate/evm): simplify sentinel key check

* refactor(simulate/evm): inline sentinel key check

---------

Co-authored-by: Tarcísio Zotelli Ferraz <tarcisio.ferraz@smartcontract.com>
* Add workflow list command

* Lint

* Clean up

* clean up

* Map registryID

* Review feedback

* restructure

* Review feedback
…#331)

* Add --non-interactive flag to all commands and improve CI/headless UX

* gendoc + add tests

* fix lint

* if --non-interactive is passed without --yes, fail immediately with a clear error instead of hanging on the transaction
  confirmation prompt.

* gendoc

* fix linter/tests

* gendoc

---------

Co-authored-by: Timothy Funnell <137069066+timothyF95@users.noreply.github.com>
Co-authored-by: Tarcísio Zotelli Ferraz <tarcisio.ferraz@smartcontract.com>
* Add workflow list command

* Lint

* Clean up

* clean up

* Map registryID

* Review feedback

* restructure

* Review feedback

* Add json output to workflow list

* Review feedback
Update typescript SDK to 1.6.0
* Enable private workflow deploytments on production

* Update tests

* make lint
* Align simulation limits with production settings

* Fix simulation limits test import formatting

* Avoid new direct time dependency in limits test
CL109-01 Owner-Key Org-Owned Requests Use Workflow-Owner-Only Authorization
…Owner (#407)

Browser Vault List And Delete Still Require Linked Workflow Owner
* Fixed 1Password op:// secrets being clobbered by .env loading

* address copilot comments

* fix logging order

---------

Co-authored-by: anirudhwarrier <12178754+anirudhwarrier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.