diff --git a/.autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json b/.autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json new file mode 100644 index 000000000..41fab0859 --- /dev/null +++ b/.autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json @@ -0,0 +1,11 @@ +{ + "Projects": [ + { + "Name": "Amazon.Lambda.DurableExecution", + "Type": "Patch", + "ChangelogMessages": [ + "Initial preview release of the Durable Execution SDK for .NET. Build long-running Lambda workflows with automatic checkpointing via `StepAsync`, `WaitAsync`, `RunInChildContextAsync`, `CreateCallbackAsync`, and `WaitForCallbackAsync` on `IDurableContext`." + ] + } + ] +} \ No newline at end of file diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/README.md b/Libraries/src/Amazon.Lambda.DurableExecution/README.md new file mode 100644 index 000000000..ad8679002 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/README.md @@ -0,0 +1,106 @@ +# AWS Lambda Durable Execution SDK for .NET + +> **Preview.** `Amazon.Lambda.DurableExecution` is in active development (0.x). Public APIs may change before 1.0. + +`Amazon.Lambda.DurableExecution` is the .NET SDK for building resilient, long-running AWS Lambda functions that automatically checkpoint progress and resume after failures. Workflows can run for up to one year, with charges only for active compute time. + +## Key Features + +- **Automatic checkpointing** — progress is saved after each step; failures resume from the last checkpoint. +- **Cost-effective waits** — suspend execution for minutes, hours, or days without compute charges. +- **Configurable retries** — built-in retry strategies with exponential backoff and jitter. +- **Replay safety** — functions deterministically resume from checkpoints after interruptions. +- **Type safety** — full generic type support for step results. +- **AOT-friendly** — pluggable `ILambdaSerializer` so you can register `SourceGeneratorLambdaJsonSerializer` for trimmed / Native AOT functions. + +## How It Works + +Your handler delegates to `DurableFunction.WrapAsync`, which gives your workflow function an `IDurableContext`. The context is your interface to durable operations: + +- `ctx.StepAsync` — run code and checkpoint the result. ([docs](docs/core/steps.md)) +- `ctx.WaitAsync` — suspend execution without compute charges. ([docs](docs/core/wait.md)) +- `ctx.CreateCallbackAsync` / `ctx.WaitForCallbackAsync` — wait for external events (approvals, webhooks). ([docs](docs/core/callbacks.md)) +- `ctx.RunInChildContextAsync` — run an isolated child context with its own checkpoint log. ([docs](docs/core/child-contexts.md)) + +## Quick Start + +### Installation + +```bash +dotnet add package Amazon.Lambda.DurableExecution +``` + +### Your first durable function + +> **Programming model:** the preview only supports the **executable programming model** — your function is an executable assembly that hosts its own bootstrap loop and passes the serializer to the runtime in code. Class-library handlers on the managed runtime will be supported once `Amazon.Lambda.RuntimeSupport` ships the changes that let `DurableFunction.WrapAsync` resolve the serializer from `ILambdaContext.Serializer`. This README will be updated then. + +A complete order-processing workflow with two steps and a wait, deployed as an executable assembly on the `dotnet10` runtime. `Main` builds a `LambdaBootstrap` with your handler and an `ILambdaSerializer`, and `DurableFunction.WrapAsync` uses that serializer to checkpoint step inputs and outputs. + +```csharp +using Amazon.Lambda.Core; +using Amazon.Lambda.DurableExecution; +using Amazon.Lambda.RuntimeSupport; +using Amazon.Lambda.Serialization.SystemTextJson; + +namespace OrderProcessor; + +public class OrderProcessor +{ + public static async Task Main() + { + var handler = new OrderProcessor(); + var serializer = new DefaultLambdaJsonSerializer(); + using var wrapper = HandlerWrapper.GetHandlerWrapper( + handler.Handler, serializer); + using var bootstrap = new LambdaBootstrap(wrapper); + await bootstrap.RunAsync(); + } + + public Task Handler( + DurableExecutionInvocationInput input, ILambdaContext context) + => DurableFunction.WrapAsync(Workflow, input, context); + + private async Task Workflow(Order order, IDurableContext ctx) + { + var reservation = await ctx.StepAsync( + async _ => await InventoryService.ReserveAsync(order.Items), + name: "reserve-inventory"); + + var payment = await ctx.StepAsync( + async _ => await PaymentService.ChargeAsync(order.PaymentMethod, order.Total), + name: "process-payment"); + + await ctx.WaitAsync(TimeSpan.FromHours(2), name: "warehouse-processing"); + + var shipment = await ctx.StepAsync( + async _ => await ShippingService.ShipAsync(reservation, order.Address), + name: "confirm-shipment"); + + return new OrderResult(order.Id, shipment.TrackingNumber); + } +} + +public record Order(string Id, IReadOnlyList Items, PaymentMethod PaymentMethod, decimal Total, Address Address); +public record OrderResult(string OrderId, string TrackingNumber); +``` + +For AOT or trim-friendly serialization, swap `DefaultLambdaJsonSerializer` for `SourceGeneratorLambdaJsonSerializer` and register your `JsonSerializerContext`. + +## Documentation + +**Core operations** + +- [Steps](docs/core/steps.md) — execute code with automatic checkpointing, retry strategies, and at-least/at-most-once semantics. +- [Wait](docs/core/wait.md) — pause execution without compute charges. +- [Callbacks](docs/core/callbacks.md) — wait for external systems to respond. +- [Child Contexts](docs/core/child-contexts.md) — group related operations into isolated, checkpointed units. + +**Examples** + +End-to-end test functions (each paired with an integration test) live under `Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/`. + +## Related SDKs + +- [aws-durable-execution-sdk-java](https://github.com/aws/aws-durable-execution-sdk-java) — Java SDK +- [aws-durable-execution-sdk-js](https://github.com/aws/aws-durable-execution-sdk-js) — JavaScript / TypeScript SDK +- [aws-durable-execution-sdk-python](https://github.com/aws/aws-durable-execution-sdk-python) — Python SDK diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/callbacks.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/callbacks.md new file mode 100644 index 000000000..94bbd838d --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/callbacks.md @@ -0,0 +1,83 @@ +# Callbacks + +Callbacks let a workflow suspend until an external system (a human approver, a webhook, another service) delivers a result. The external system completes the callback by calling `SendDurableExecutionCallbackSuccess`, `SendDurableExecutionCallbackFailure`, or `SendDurableExecutionCallbackHeartbeat` with the `callbackId` you handed it. + +Two APIs are available: + +- `WaitForCallbackAsync` — composite operation; create the callback, hand it to the external system inside a submitter delegate, and suspend until the result arrives. +- `CreateCallbackAsync` — lower-level; allocate the callback yourself, hand the ID out in your own steps, and `await` the result separately. + +## `WaitForCallbackAsync` + +```csharp +Task WaitForCallbackAsync( + Func submitter, + string? name = null, + WaitForCallbackConfig? config = null, + CancellationToken cancellationToken = default); +``` + +The submitter receives the freshly allocated `callbackId` and an `IWaitForCallbackContext` (logger-only). Submitter failures (after retries are exhausted) surface as `CallbackSubmitterException`; callback failures and timeouts surface as `CallbackFailedException` / `CallbackTimeoutException`. + +```csharp +var result = await ctx.WaitForCallbackAsync( + submitter: async (callbackId, cbCtx) => + { + var payload = $$"""{"callbackId":"{{callbackId}}","orderId":"{{input.OrderId}}"}"""; + await LambdaClient.InvokeAsync(new InvokeRequest + { + FunctionName = externalFunctionName, + InvocationType = InvocationType.Event, + Payload = payload + }); + }, + name: "approve"); +``` + +## `CreateCallbackAsync` + +```csharp +Task> CreateCallbackAsync( + string? name = null, + CallbackConfig? config = null, + CancellationToken cancellationToken = default); +``` + +The returned `ICallback` exposes: + +- `string CallbackId` — give this to the external system. +- `Task GetResultAsync(CancellationToken)` — `await` to suspend until the external system completes the callback. + +The result is deserialized using the registered `ILambdaSerializer`. Throws `CallbackFailedException` or `CallbackTimeoutException` on failure. + +```csharp +var cb = await ctx.CreateCallbackAsync(name: "approve"); + +await ctx.StepAsync(async _ => +{ + var payload = $$"""{"callbackId":"{{cb.CallbackId}}","orderId":"{{input.OrderId}}"}"""; + await LambdaClient.InvokeAsync(new InvokeRequest + { + FunctionName = externalFunctionName, + InvocationType = InvocationType.Event, + Payload = payload + }); +}, name: "submit"); + +return await cb.GetResultAsync(); +``` + +## Configuration + +```csharp +public class CallbackConfig +{ + public TimeSpan Timeout { get; set; } // overall callback timeout, ≥ 1s or Zero (default = no timeout) + public TimeSpan HeartbeatTimeout { get; set; } // heartbeat-gap timeout, ≥ 1s or Zero (default = no timeout) +} + +public class WaitForCallbackConfig : CallbackConfig +{ + public IRetryStrategy? RetryStrategy { get; set; } // applied to the submitter step only +} +``` diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/child-contexts.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/child-contexts.md new file mode 100644 index 000000000..4a664e11e --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/child-contexts.md @@ -0,0 +1,46 @@ +# Child Contexts + +`RunInChildContextAsync` runs a sub-workflow inside its own deterministic operation-ID space. The child's return value is checkpointed as a single `CONTEXT` operation, so subsequent invocations replay the cached value without re-executing the contained operations. Use to group related steps under a shared error/observability boundary. + +## Signatures + +```csharp +Task RunInChildContextAsync( + Func> func, + string? name = null, + ChildContextConfig? config = null, + CancellationToken cancellationToken = default); + +Task RunInChildContextAsync( + Func func, + string? name = null, + ChildContextConfig? config = null, + CancellationToken cancellationToken = default); +``` + +## Example + +```csharp +var phaseResult = await ctx.RunInChildContextAsync( + async childCtx => + { + var validated = await childCtx.StepAsync(async _ => Validate(input), name: "validate"); + await childCtx.WaitAsync(TimeSpan.FromSeconds(2), name: "short_wait"); + var processed = await childCtx.StepAsync(async _ => Process(validated), name: "process"); + return processed; + }, + name: "phase", + config: new ChildContextConfig { SubType = "OrderProcessing" }); +``` + +## Configuration + +```csharp +public sealed class ChildContextConfig +{ + public string? SubType { get; set; } // observability label + public Func? ErrorMapping { get; set; } // remap thrown exceptions +} +``` + +`ErrorMapping` lets you translate exceptions thrown inside the child context into a domain-specific exception type before they propagate to the parent. diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md new file mode 100644 index 000000000..c7f9e9f22 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md @@ -0,0 +1,148 @@ +# Steps + +`StepAsync` runs a unit of work whose result is checkpointed. On replay, completed steps return their cached result without re-executing. + +## Signatures + +```csharp +Task StepAsync( + Func> func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); + +Task StepAsync( + Func func, + string? name = null, + StepConfig? config = null, + CancellationToken cancellationToken = default); +``` + +The `IStepContext` parameter exposes the current `AttemptNumber`, the deterministic `OperationId`, and a scoped `Logger`. Returned values are serialized via the `ILambdaSerializer` registered on `ILambdaContext.Serializer`. + +## Basic step + +```csharp +var user = await ctx.StepAsync( + async _ => await userService.GetUserAsync(userId), + name: "fetch-user"); +``` + +## Multiple steps + +```csharp +var a = await ctx.StepAsync(async _ => $"a-{input.OrderId}", name: "step_1"); +var b = await ctx.StepAsync(async _ => $"{a}-b", name: "step_2"); +var c = await ctx.StepAsync(async _ => $"{b}-c", name: "step_3"); +``` + +## Step configuration + +Configure step behavior with `StepConfig`: + +```csharp +public sealed class StepConfig +{ + public IRetryStrategy? RetryStrategy { get; set; } // null = no retry + public StepSemantics Semantics { get; set; } = StepSemantics.AtLeastOncePerRetry; +} +``` + +### Retry strategies + +When a step throws, the configured `IRetryStrategy` decides whether to retry and after what delay. + +```csharp +public interface IRetryStrategy +{ + RetryDecision ShouldRetry(Exception exception, int attemptNumber); +} + +public readonly struct RetryDecision +{ + public bool ShouldRetry { get; } + public TimeSpan Delay { get; } + + public static RetryDecision DoNotRetry(); + public static RetryDecision RetryAfter(TimeSpan delay); +} +``` + +Built-in strategies on the `RetryStrategy` static class: + +| Member | Behavior | +| --- | --- | +| `RetryStrategy.Default` | 6 attempts, 2× backoff, 5s initial, 60s max, Full jitter. | +| `RetryStrategy.Transient` | 3 attempts, 2× backoff, 1s initial, 5s max, Half jitter. | +| `RetryStrategy.None` | 1 attempt only — no retry. | +| `RetryStrategy.Exponential(...)` | Builder for custom exponential strategies. | +| `RetryStrategy.FromDelegate(Func)` | Wrap a custom decision function. | + +`Exponential` parameters: + +```csharp +public static IRetryStrategy Exponential( + int maxAttempts = 3, + TimeSpan? initialDelay = null, // default 5s + TimeSpan? maxDelay = null, // default 300s + double backoffRate = 2.0, + JitterStrategy jitter = JitterStrategy.Full, + Type[]? retryableExceptions = null, + string[]? retryableMessagePatterns = null); + +public enum JitterStrategy { None, Full, Half } +``` + +When `retryableExceptions` and `retryableMessagePatterns` are both null (default), every exception is retried up to `maxAttempts`. If either is set, only matching exceptions are retried. + +#### Step with retries + +```csharp +var result = await ctx.StepAsync( + async stepCtx => + { + if (stepCtx.AttemptNumber < 3) + throw new InvalidOperationException($"flake on attempt {stepCtx.AttemptNumber}"); + return $"ok on attempt {stepCtx.AttemptNumber}"; + }, + name: "flaky_step", + config: new StepConfig + { + RetryStrategy = RetryStrategy.Exponential( + maxAttempts: 3, + initialDelay: TimeSpan.FromSeconds(2), + maxDelay: TimeSpan.FromSeconds(10), + backoffRate: 2.0, + jitter: JitterStrategy.None) + }); +``` + +### Step semantics + +Control how a step behaves when interrupted mid-execution: + +```csharp +public enum StepSemantics +{ + AtLeastOncePerRetry, // default — body may re-execute if Lambda is re-invoked mid-attempt + AtMostOncePerRetry // body executes at most once per retry attempt +} +``` + +| Semantic | Behavior | Use case | +| --- | --- | --- | +| `AtLeastOncePerRetry` (default) | Re-executes the step if interrupted before completion. | Idempotent operations (database upserts, API calls with idempotency keys). | +| `AtMostOncePerRetry` | Never re-executes; throws if interrupted. | Non-idempotent operations (sending email, charging payments). | + +These semantics apply *per retry attempt*, not per overall execution. To achieve true at-most-once across the whole workflow, combine with `RetryStrategy.None`: + +```csharp +var result = await ctx.StepAsync( + async _ => await paymentService.ChargeAsync(amount), + name: "charge-payment", + config: new StepConfig + { + Semantics = StepSemantics.AtMostOncePerRetry, + RetryStrategy = RetryStrategy.None + }); +``` diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/wait.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/wait.md new file mode 100644 index 000000000..d7d2679f4 --- /dev/null +++ b/Libraries/src/Amazon.Lambda.DurableExecution/docs/core/wait.md @@ -0,0 +1,28 @@ +# Wait + +`WaitAsync` suspends the workflow for a duration. The Lambda terminates and is re-invoked when the timer fires — you pay for compute time only on the resume side. + +## Signature + +```csharp +Task WaitAsync( + TimeSpan duration, + string? name = null, + CancellationToken cancellationToken = default); +``` + +`duration` must be at least 1 second and at most 31,622,400 seconds (~1 year). + +## Example + +```csharp +await ctx.WaitAsync(TimeSpan.FromHours(2), name: "warehouse-processing"); +``` + +## Step + Wait + Step + +```csharp +var validated = await ctx.StepAsync(async _ => Validate(input), name: "validate"); +await ctx.WaitAsync(TimeSpan.FromSeconds(3), name: "short_wait"); +var processed = await ctx.StepAsync(async _ => Process(validated), name: "process"); +``` diff --git a/README.md b/README.md index 405e952a5..afd2c11e3 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ For a history of releases view the [release change log](CHANGELOG.md) - [Amazon.Lambda.Annotations](#amazonlambdaannotations) - [Amazon.Lambda.AspNetCoreServer](#amazonlambdaaspnetcoreserver) - [Amazon.Lambda.TestUtilities](#amazonlambdatestutilities) + - [Amazon.Lambda.DurableExecution](#amazonlambdadurableexecution) - [Blueprints](#blueprints) - [Dotnet CLI Templates](#dotnet-cli-templates) - [Yeoman (Deprecated)](#yeoman-deprecated) @@ -113,6 +114,11 @@ For more information see the [README.md](Libraries/src/Amazon.Lambda.AspNetCoreS Package includes test implementation of the interfaces from Amazon.Lambda.Core and helper methods to help in locally testing. For more information see the [README.md](Libraries/src/Amazon.Lambda.TestUtilities/README.md) file for Amazon.Lambda.TestUtilities. +### Amazon.Lambda.DurableExecution + +The Durable Execution SDK lets you write multi-step Lambda workflows that automatically checkpoint progress and resume after failures. +For more information see the [README.md](Libraries/src/Amazon.Lambda.DurableExecution/README.md) file for Amazon.Lambda.DurableExecution. + ## Blueprints Blueprints in this repository are .NET Core Lambda functions that can used to get started. In Visual Studio the Blueprints are available when creating a new project and selecting the AWS Lambda Project.