Skip to content

Fix stress pipeline: rename variables to avoid env var injection, simplify runner#4329

Open
paulmedynski wants to merge 10 commits into
mainfrom
dev/paul/stress
Open

Fix stress pipeline: rename variables to avoid env var injection, simplify runner#4329
paulmedynski wants to merge 10 commits into
mainfrom
dev/paul/stress

Conversation

@paulmedynski
Copy link
Copy Markdown
Contributor

@paulmedynski paulmedynski commented Jun 1, 2026

Summary

Fixes stress tests running in Help mode on CI instead of executing tests. The root cause was the dotnet CLI's System.CommandLine silently injecting pipeline variable content into application arguments.

Root Cause

The dotnet CLI reads environment variables named {COMMAND}ARGUMENTS (e.g. RUNARGUMENTS, BUILDARGUMENTS, TESTARGUMENTS) and prepends their content to the parsed arguments for that subcommand. Azure DevOps automatically exposes all pipeline variables as uppercased environment variables, so a pipeline variable named buildArguments became BUILDARGUMENTS, and testArguments became TESTARGUMENTS — causing their content to leak into the application's args[] and triggering Help mode.

This affects all .NET SDK versions (8.0+) and is invisible in [command] log lines.

You can see the pipeline executing the tests now:

They aren't passing, but that's a problem for another PR.

Changes

Pipeline Default Behaviour

  • Changed the default bhaviour to fail the entire pipeline if any stress test steps fail.
    • This is the opposite of it's original configuration, which would pass with issues if any tests failed.
    • This will cause PRs to show failing checks, but will not prevent PRs from merging (stress tests aren't required checks).

Pipeline variable rename (stress-tests-job.yml)

  • Rename buildArgumentsdotnetBuildOpts
  • Rename testArgumentsstressTestOpts
  • Add dotnetRunOpts variable for shared dotnet run options
  • Use ${{ variables.* }} compile-time expansion for all static variables
  • Add warning comment explaining the {COMMAND}ARGUMENTS env var footgun

Pool name auto-selection (stress-tests-stage.yml)

  • Automatically select 1ES pool based on ADO project name:
    • ADO.NetADO-1ES-Pool
    • publicADO-CI-1ES-Pool
  • Uses ${{ if eq(variables['System.TeamProject'], 'ADO.Net') }} at compile time
  • Removes hardcoded pool names from job invocations

Stress runner simplification (Program.cs)

  • Upgrade to use System.CommandLine
  • Remove dead code paths (RunMode.RunOne, RunMode.Profile, XML config)
  • Remove PlatformAttribute (unused)
  • Remove version detection logic (unused)
  • Simplify to just RunAll and Help modes

Pipeline structure (stress-tests-pipeline.yml, stress-tests-stage.yml)

  • Remove unnecessary parameter defaults (require explicit values)
  • Thread parameters cleanly through pipeline → stage → job
  • Explicitly pass netFrameworkTestRuntimes: [] for Linux/macOS jobs

Instructions update (.github/instructions/ado-pipelines.instructions.md)

  • Document the {COMMAND}ARGUMENTS env var injection gotcha
  • List forbidden variable names and safe alternatives

Verification

  • Confirmed the exact CI command was triggering Help mode due to env var injection
  • Verified variable rename eliminates the injection (Run 15321)
  • Confirmed env: property correctly sets STRESS_CONFIG_FILE for the spawned process
  • Verified pool name auto-selection compiles correctly for both projects

… warnOnTestFailure

- Move STRESS_CONFIG_FILE from dotnet run -e arg to task env: property
  to fix DotNetCoreCLI@2 leaking the env var value as an app argument,
  which hit the default switch case and set RunMode.Help.

- Rename failOnTestFailure to warnOnTestFailure (default false) with
  inverted semantics: false = failures fail the run, true = warnings only.
  All test steps always execute regardless of this setting.
Copilot AI review requested due to automatic review settings June 1, 2026 13:56
@github-project-automation github-project-automation Bot moved this to To triage in SqlClient Board Jun 1, 2026
@paulmedynski paulmedynski added Area\Tests Issues that are targeted to tests or test projects Area\Engineering Use this for issues that are targeted for changes in the 'eng' folder or build systems. labels Jun 1, 2026
@paulmedynski paulmedynski moved this from To triage to In progress in SqlClient Board Jun 1, 2026
@paulmedynski paulmedynski added this to the 7.1.0-preview2 milestone Jun 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Azure DevOps stress test runs accidentally executing in “Help” mode by moving STRESS_CONFIG_FILE injection from dotnet run -e ... into the supported env: block for DotNetCoreCLI@2. It also renames the failure-handling parameter to warnOnTestFailure and aligns continueOnError semantics with that name.

Changes:

  • Set STRESS_CONFIG_FILE via env: on the DotNetCoreCLI@2 run tasks (instead of dotnet run -e ...) to prevent argument leakage into the runner.
  • Replace failOnTestFailure with warnOnTestFailure (default false) and update continueOnError accordingly.
  • Add --no-build to shared runArguments so dotnet run consistently uses the already-built outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
eng/pipelines/stress/stress-tests-stage.yml Renames and threads the new warnOnTestFailure parameter into the stage’s OS-specific jobs.
eng/pipelines/stress/stress-tests-pipeline.yml Exposes warnOnTestFailure as a pipeline parameter and passes it to the stress test stage.
eng/pipelines/stress/stress-tests-job.yml Applies warnOnTestFailure to continueOnError and sets STRESS_CONFIG_FILE via task env: for each runtime run.

- Remove RunMode enum, RunVerify, ExitWithError, and Help mode
- Init() returns exit code directly; Main() short-circuits on non-zero
- Rename RunStress() to Run() (only remaining path)
- Fix GetMdsVersion() to use correct namespace (Microsoft.Data.SqlClient.ThisAssembly)
- PrintHelp() dumps received args with quoting for diagnostics
Copilot AI review requested due to automatic review settings June 1, 2026 14:36
The task's 'command: run' mode puts all 'arguments' content after '--'
as application args, ignoring the user-specified '--' separator. This
caused dotnet-run flags (--verbosity, --configuration, --no-build, -f)
to be passed to the stress runner app instead of to dotnet-run itself.

Switch to 'command: custom' + 'custom: run' which passes arguments
literally, giving us full control over placement.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/Microsoft.Data.SqlClient/tests/StressTests/SqlClient.Stress.Runner/Program.cs Outdated
Comment thread eng/pipelines/stress/stress-tests-stage.yml Outdated
Copilot AI review requested due to automatic review settings June 1, 2026 15:44
…gression

dotnet run in SDK 10.0.300 incorrectly forwards its own options
(--verbosity, --configuration, --no-build) as application arguments
on the CI agent (Ubuntu 22.04). This cannot be reproduced locally
with the same SDK version but manifests consistently on hosted agents.

Since we build separately anyway (--no-build), switch to dotnet exec
which directly invokes the built DLL without any argument parsing
ambiguity.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/Microsoft.Data.SqlClient/tests/StressTests/SqlClient.Stress.Runner/Program.cs Outdated
Comment thread eng/pipelines/stress/stress-tests-job.yml Outdated
- Remove stale -all and -verify options from PrintHelp (no run modes remain)
- Clarify netTestRuntimes comment to note net10.0 is tested but not shipped
- Use CmdLine@2 for .NET Framework test loop (dotnet exec cannot run net462 exe)
The dotnet CLI's System.CommandLine reads environment variables named
{COMMAND}ARGUMENTS (e.g. RUNARGUMENTS, BUILDARGUMENTS, TESTARGUMENTS)
and silently injects their content into parsed arguments. Since ADO
exposes all pipeline variables as uppercased env vars, variables named
'runArguments', 'buildArguments', or 'testArguments' caused the stress
runner to receive SDK options in its args[], triggering Help mode.

Changes:
- Rename buildArguments -> dotnetBuildOpts
- Rename testArguments -> stressTestOpts
- Add dotnetRunOpts variable for shared dotnet run options
- Revert dotnet exec back to dotnet run (the real fix is the rename)
- Use ${{ variables.* }} for compile-time expansion
- Document the gotcha in ado-pipelines.instructions.md
Copilot AI review requested due to automatic review settings June 2, 2026 19:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings June 3, 2026 11:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/Microsoft.Data.SqlClient/tests/StressTests/SqlClient.Stress.Runner/Program.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/tests/StressTests/SqlClient.Stress.Runner/Program.cs Outdated
@paulmedynski paulmedynski changed the title Fix stress tests running in Help mode; replace failOnTestFailure with warnOnTestFailure Fix stress pipeline: rename variables to avoid env var injection, simplify runner Jun 3, 2026
- Upgraded the Runner to use System.CommandLine instead of trying to parse arguments by hand.
@paulmedynski paulmedynski marked this pull request as ready for review June 3, 2026 12:38
@paulmedynski paulmedynski requested review from a team and Copilot June 3, 2026 12:38
@paulmedynski paulmedynski requested a review from a team as a code owner June 3, 2026 12:38
@paulmedynski paulmedynski enabled auto-merge (squash) June 3, 2026 12:38
@paulmedynski paulmedynski moved this from In progress to In review in SqlClient Board Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 53 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/Microsoft.Data.SqlClient/tests/StressTests/SqlClient.Stress.Runner/Monitor/MonitorLoadUtils.cs:22

  • mainAssembly.GetType(classname) can return null (and in the current layout it will, since mainAssembly is the Monitoring assembly that contains IMonitorLoader, but it does not define Microsoft.Data.SqlClient.Test.Stress.MonitorLoader). The subsequent t.GetInterfaces() will throw a NullReferenceException, and enabling monitoring will fail with an unhelpful crash. Add an explicit null check with a clear error so failures are actionable (and ensure the expected IMonitorLoader implementation type exists in the target assembly).


## Variable Naming — Avoid `{COMMAND}ARGUMENTS` Names

The dotnet CLI (via System.CommandLine) reads environment variables named `{COMMAND}ARGUMENTS` and silently injects their content into the parsed arguments for that subcommand. Because Azure DevOps automatically exposes all pipeline variables as uppercased environment variables, a pipeline variable named `runArguments` becomes `RUNARGUMENTS`, which `dotnet run` reads and injects into the application's `args[]` — bypassing the `--` separator.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took many hours to figure out.

# True to enable debugging steps.
- name: debug
type: boolean
default: false
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding template parameter default values except at the top level.

# warnings but do not fail the overall pipeline run.
- name: failOnTestFailure
displayName: Fail pipeline on test failure
# When true, test failures produce warnings (SucceededWithIssues) but do not fail the pipeline.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flips the default behaviour - now the entire pipeline run will fail if any tests steps fail.

command: build
projects: $(project)
arguments: $(buildArguments)
arguments: ${{ variables.dotnetBuildOpts }}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to expand static variables at pipeline expansion time, rather than runtime.

condition: and(succeededOrFailed(), eq(variables['buildSucceeded'], 'true'))
continueOnError: ${{ not(parameters.failOnTestFailure) }}
continueOnError: ${{ parameters.warnOnTestFailure }}
env:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more robust way of injecting environment variables, rather than using the -e argument.

# Triggering pipeline: MDS Main CI (branch internal/main only)
# Pipeline name: sqlclient-stress
# Pipeline URL: TODO: add URL when pipeline is created
# Pipeline URL: https://dev.azure.com/SqlClientDrivers/ADO.Net/_build?definitionId=2284
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a sqlclient-stress pipeline in the ADO.Net project, so this will now trigger on successful completion of the MDS Main CI pipeline runs in that project.

- name: saPassword
value: $[stageDependencies.secrets_stage.secrets_job.outputs['SaPassword.Value']]

# The 1ES pool name, determined automatically by ADO project:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a superior approach to choosing pool name. We should plan to update our other cross-project pipelines, and remove the variable group $(ci_var_poolName) approach. Our pipelines need to explicitly support all target Azure DevOps projects anyway, so no harm in hardcoding their names and pools.

Copy link
Copy Markdown
Member

@cheenamalhotra cheenamalhotra Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding 1ES pool or image names can be tedious to manage and the more pipelines we get - the name of images can get more widespread and lost in the files.

I would suggest we use a structured variable file (single 1ES artifact) that contains only 1ES pool and image names so if you need to identify and list all the 1ES pools and images being used in this repo, you can gather them from 1 location, or if any one image needs to replaced by something new you don't need to hunt down all pipelines but can update it from one location.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using System.Collections.Generic;

namespace Monitoring
namespace Microsoft.Data.SqlClient.Test.Stress
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave all of these classes a proper namespace.

public static void Init(string[] args)
static int Main(string[] args)
{
for (int i = 0; i < args.Length; i++)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual command-line parsing contributed to problems diagnosing the root cause, so I modernized it all with System.CommandLine.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.94%. Comparing base (bfbdd30) to head (98f2b89).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4329      +/-   ##
==========================================
- Coverage   66.69%   64.94%   -1.75%     
==========================================
  Files         284      279       -5     
  Lines       43238    66069   +22831     
==========================================
+ Hits        28836    42908   +14072     
- Misses      14402    23161    +8759     
Flag Coverage Δ
CI-SqlClient ?
PR-SqlClient-Project 64.94% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area\Engineering Use this for issues that are targeted for changes in the 'eng' folder or build systems. Area\Tests Issues that are targeted to tests or test projects

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

5 participants