Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .autover/changes/91693d62-b0c7-49b0-a74f-531aa1509864.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"Projects": [
{
"Name": "Amazon.Lambda.DurableExecution",
"Type": "Patch",
"ChangelogMessages": [
"Initial preview release of the Durable Execution SDK for .NET. Build long-running Lambda workflows with automatic checkpointing via `StepAsync`, `WaitAsync`, `RunInChildContextAsync`, `CreateCallbackAsync`, and `WaitForCallbackAsync` on `IDurableContext`."
]
}
]
}
106 changes: 106 additions & 0 deletions Libraries/src/Amazon.Lambda.DurableExecution/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# AWS Lambda Durable Execution SDK for .NET

> **Preview.** `Amazon.Lambda.DurableExecution` is in active development (0.x). Public APIs may change before 1.0.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have the deployment of new Amazon.Lambda.RuntimeSupport with support for accessing the serializer you should call out only executable programming model is supported for now.

I would suggest making all of the examples be executable mode as well and later we can revise the README once the managed runtime is updated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


`Amazon.Lambda.DurableExecution` is the .NET SDK for building resilient, long-running AWS Lambda functions that automatically checkpoint progress and resume after failures. Workflows can run for up to one year, with charges only for active compute time.

## Key Features

- **Automatic checkpointing** — progress is saved after each step; failures resume from the last checkpoint.
- **Cost-effective waits** — suspend execution for minutes, hours, or days without compute charges.
- **Configurable retries** — built-in retry strategies with exponential backoff and jitter.
- **Replay safety** — functions deterministically resume from checkpoints after interruptions.
- **Type safety** — full generic type support for step results.
- **AOT-friendly** — pluggable `ILambdaSerializer` so you can register `SourceGeneratorLambdaJsonSerializer<TContext>` for trimmed / Native AOT functions.

## How It Works

Your handler delegates to `DurableFunction.WrapAsync`, which gives your workflow function an `IDurableContext`. The context is your interface to durable operations:

- `ctx.StepAsync` — run code and checkpoint the result. ([docs](docs/core/steps.md))
- `ctx.WaitAsync` — suspend execution without compute charges. ([docs](docs/core/wait.md))
- `ctx.CreateCallbackAsync` / `ctx.WaitForCallbackAsync` — wait for external events (approvals, webhooks). ([docs](docs/core/callbacks.md))
- `ctx.RunInChildContextAsync` — run an isolated child context with its own checkpoint log. ([docs](docs/core/child-contexts.md))

## Quick Start

### Installation

```bash
dotnet add package Amazon.Lambda.DurableExecution
```

### Your first durable function

> **Programming model:** the preview only supports the **executable programming model** — your function is an executable assembly that hosts its own bootstrap loop and passes the serializer to the runtime in code. Class-library handlers on the managed runtime will be supported once `Amazon.Lambda.RuntimeSupport` ships the changes that let `DurableFunction.WrapAsync` resolve the serializer from `ILambdaContext.Serializer`. This README will be updated then.

A complete order-processing workflow with two steps and a wait, deployed as an executable assembly on the `dotnet10` runtime. `Main` builds a `LambdaBootstrap` with your handler and an `ILambdaSerializer`, and `DurableFunction.WrapAsync` uses that serializer to checkpoint step inputs and outputs.

```csharp
using Amazon.Lambda.Core;
using Amazon.Lambda.DurableExecution;
using Amazon.Lambda.RuntimeSupport;
using Amazon.Lambda.Serialization.SystemTextJson;

namespace OrderProcessor;

public class OrderProcessor
{
public static async Task Main()
{
var handler = new OrderProcessor();
var serializer = new DefaultLambdaJsonSerializer();
using var wrapper = HandlerWrapper.GetHandlerWrapper<DurableExecutionInvocationInput, DurableExecutionInvocationOutput>(
handler.Handler, serializer);
using var bootstrap = new LambdaBootstrap(wrapper);
await bootstrap.RunAsync();
}

public Task<DurableExecutionInvocationOutput> Handler(
DurableExecutionInvocationInput input, ILambdaContext context)
=> DurableFunction.WrapAsync<Order, OrderResult>(Workflow, input, context);

private async Task<OrderResult> Workflow(Order order, IDurableContext ctx)
{
var reservation = await ctx.StepAsync(
async _ => await InventoryService.ReserveAsync(order.Items),
name: "reserve-inventory");

var payment = await ctx.StepAsync(
async _ => await PaymentService.ChargeAsync(order.PaymentMethod, order.Total),
name: "process-payment");

await ctx.WaitAsync(TimeSpan.FromHours(2), name: "warehouse-processing");

var shipment = await ctx.StepAsync(
async _ => await ShippingService.ShipAsync(reservation, order.Address),
name: "confirm-shipment");

return new OrderResult(order.Id, shipment.TrackingNumber);
}
}

public record Order(string Id, IReadOnlyList<OrderItem> Items, PaymentMethod PaymentMethod, decimal Total, Address Address);
public record OrderResult(string OrderId, string TrackingNumber);
```

For AOT or trim-friendly serialization, swap `DefaultLambdaJsonSerializer` for `SourceGeneratorLambdaJsonSerializer<TContext>` and register your `JsonSerializerContext`.

## Documentation

**Core operations**

- [Steps](docs/core/steps.md) — execute code with automatic checkpointing, retry strategies, and at-least/at-most-once semantics.
- [Wait](docs/core/wait.md) — pause execution without compute charges.
- [Callbacks](docs/core/callbacks.md) — wait for external systems to respond.
- [Child Contexts](docs/core/child-contexts.md) — group related operations into isolated, checkpointed units.

**Examples**

End-to-end test functions (each paired with an integration test) live under `Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/`.

## Related SDKs

- [aws-durable-execution-sdk-java](https://github.com/aws/aws-durable-execution-sdk-java) — Java SDK
- [aws-durable-execution-sdk-js](https://github.com/aws/aws-durable-execution-sdk-js) — JavaScript / TypeScript SDK
- [aws-durable-execution-sdk-python](https://github.com/aws/aws-durable-execution-sdk-python) — Python SDK
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Callbacks

Callbacks let a workflow suspend until an external system (a human approver, a webhook, another service) delivers a result. The external system completes the callback by calling `SendDurableExecutionCallbackSuccess`, `SendDurableExecutionCallbackFailure`, or `SendDurableExecutionCallbackHeartbeat` with the `callbackId` you handed it.

Two APIs are available:

- `WaitForCallbackAsync<T>` — composite operation; create the callback, hand it to the external system inside a submitter delegate, and suspend until the result arrives.
- `CreateCallbackAsync<T>` — lower-level; allocate the callback yourself, hand the ID out in your own steps, and `await` the result separately.

## `WaitForCallbackAsync<T>`

```csharp
Task<T> WaitForCallbackAsync<T>(
Func<string, IWaitForCallbackContext, Task> submitter,
string? name = null,
WaitForCallbackConfig? config = null,
CancellationToken cancellationToken = default);
```

The submitter receives the freshly allocated `callbackId` and an `IWaitForCallbackContext` (logger-only). Submitter failures (after retries are exhausted) surface as `CallbackSubmitterException`; callback failures and timeouts surface as `CallbackFailedException` / `CallbackTimeoutException`.

```csharp
var result = await ctx.WaitForCallbackAsync<MyResult>(
submitter: async (callbackId, cbCtx) =>
{
var payload = $$"""{"callbackId":"{{callbackId}}","orderId":"{{input.OrderId}}"}""";
await LambdaClient.InvokeAsync(new InvokeRequest
{
FunctionName = externalFunctionName,
InvocationType = InvocationType.Event,
Payload = payload
});
},
name: "approve");
```

## `CreateCallbackAsync<T>`

```csharp
Task<ICallback<T>> CreateCallbackAsync<T>(
string? name = null,
CallbackConfig? config = null,
CancellationToken cancellationToken = default);
```

The returned `ICallback<T>` exposes:

- `string CallbackId` — give this to the external system.
- `Task<T> GetResultAsync(CancellationToken)` — `await` to suspend until the external system completes the callback.

The result is deserialized using the registered `ILambdaSerializer`. Throws `CallbackFailedException` or `CallbackTimeoutException` on failure.

```csharp
var cb = await ctx.CreateCallbackAsync<MyResult>(name: "approve");

await ctx.StepAsync(async _ =>
{
var payload = $$"""{"callbackId":"{{cb.CallbackId}}","orderId":"{{input.OrderId}}"}""";
await LambdaClient.InvokeAsync(new InvokeRequest
{
FunctionName = externalFunctionName,
InvocationType = InvocationType.Event,
Payload = payload
});
}, name: "submit");

return await cb.GetResultAsync();
```

## Configuration

```csharp
public class CallbackConfig
{
public TimeSpan Timeout { get; set; } // overall callback timeout, ≥ 1s or Zero (default = no timeout)
public TimeSpan HeartbeatTimeout { get; set; } // heartbeat-gap timeout, ≥ 1s or Zero (default = no timeout)
}

public class WaitForCallbackConfig : CallbackConfig
{
public IRetryStrategy? RetryStrategy { get; set; } // applied to the submitter step only
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Child Contexts

`RunInChildContextAsync` runs a sub-workflow inside its own deterministic operation-ID space. The child's return value is checkpointed as a single `CONTEXT` operation, so subsequent invocations replay the cached value without re-executing the contained operations. Use to group related steps under a shared error/observability boundary.

## Signatures

```csharp
Task<T> RunInChildContextAsync<T>(
Func<IDurableContext, Task<T>> func,
string? name = null,
ChildContextConfig? config = null,
CancellationToken cancellationToken = default);

Task RunInChildContextAsync(
Func<IDurableContext, Task> func,
string? name = null,
ChildContextConfig? config = null,
CancellationToken cancellationToken = default);
```

## Example

```csharp
var phaseResult = await ctx.RunInChildContextAsync<string>(
async childCtx =>
{
var validated = await childCtx.StepAsync(async _ => Validate(input), name: "validate");
await childCtx.WaitAsync(TimeSpan.FromSeconds(2), name: "short_wait");
var processed = await childCtx.StepAsync(async _ => Process(validated), name: "process");
return processed;
},
name: "phase",
config: new ChildContextConfig { SubType = "OrderProcessing" });
```

## Configuration

```csharp
public sealed class ChildContextConfig
{
public string? SubType { get; set; } // observability label
public Func<Exception, Exception>? ErrorMapping { get; set; } // remap thrown exceptions
}
```

`ErrorMapping` lets you translate exceptions thrown inside the child context into a domain-specific exception type before they propagate to the parent.
148 changes: 148 additions & 0 deletions Libraries/src/Amazon.Lambda.DurableExecution/docs/core/steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Steps

`StepAsync` runs a unit of work whose result is checkpointed. On replay, completed steps return their cached result without re-executing.

## Signatures

```csharp
Task<T> StepAsync<T>(
Func<IStepContext, Task<T>> func,
string? name = null,
StepConfig? config = null,
CancellationToken cancellationToken = default);

Task StepAsync(
Func<IStepContext, Task> func,
string? name = null,
StepConfig? config = null,
CancellationToken cancellationToken = default);
```

The `IStepContext` parameter exposes the current `AttemptNumber`, the deterministic `OperationId`, and a scoped `Logger`. Returned values are serialized via the `ILambdaSerializer` registered on `ILambdaContext.Serializer`.

## Basic step

```csharp
var user = await ctx.StepAsync(
async _ => await userService.GetUserAsync(userId),
name: "fetch-user");
```

## Multiple steps

```csharp
var a = await ctx.StepAsync(async _ => $"a-{input.OrderId}", name: "step_1");
var b = await ctx.StepAsync(async _ => $"{a}-b", name: "step_2");
var c = await ctx.StepAsync(async _ => $"{b}-c", name: "step_3");
```

## Step configuration

Configure step behavior with `StepConfig`:

```csharp
public sealed class StepConfig
{
public IRetryStrategy? RetryStrategy { get; set; } // null = no retry
public StepSemantics Semantics { get; set; } = StepSemantics.AtLeastOncePerRetry;
}
```

### Retry strategies

When a step throws, the configured `IRetryStrategy` decides whether to retry and after what delay.

```csharp
public interface IRetryStrategy
{
RetryDecision ShouldRetry(Exception exception, int attemptNumber);
}

public readonly struct RetryDecision
{
public bool ShouldRetry { get; }
public TimeSpan Delay { get; }

public static RetryDecision DoNotRetry();
public static RetryDecision RetryAfter(TimeSpan delay);
}
```

Built-in strategies on the `RetryStrategy` static class:

| Member | Behavior |
| --- | --- |
| `RetryStrategy.Default` | 6 attempts, 2× backoff, 5s initial, 60s max, Full jitter. |
| `RetryStrategy.Transient` | 3 attempts, 2× backoff, 1s initial, 5s max, Half jitter. |
| `RetryStrategy.None` | 1 attempt only — no retry. |
| `RetryStrategy.Exponential(...)` | Builder for custom exponential strategies. |
| `RetryStrategy.FromDelegate(Func<Exception, int, RetryDecision>)` | Wrap a custom decision function. |

`Exponential` parameters:

```csharp
public static IRetryStrategy Exponential(
int maxAttempts = 3,
TimeSpan? initialDelay = null, // default 5s
TimeSpan? maxDelay = null, // default 300s
double backoffRate = 2.0,
JitterStrategy jitter = JitterStrategy.Full,
Type[]? retryableExceptions = null,
string[]? retryableMessagePatterns = null);

public enum JitterStrategy { None, Full, Half }
```

When `retryableExceptions` and `retryableMessagePatterns` are both null (default), every exception is retried up to `maxAttempts`. If either is set, only matching exceptions are retried.

#### Step with retries

```csharp
var result = await ctx.StepAsync<string>(
async stepCtx =>
{
if (stepCtx.AttemptNumber < 3)
throw new InvalidOperationException($"flake on attempt {stepCtx.AttemptNumber}");
return $"ok on attempt {stepCtx.AttemptNumber}";
},
name: "flaky_step",
config: new StepConfig
{
RetryStrategy = RetryStrategy.Exponential(
maxAttempts: 3,
initialDelay: TimeSpan.FromSeconds(2),
maxDelay: TimeSpan.FromSeconds(10),
backoffRate: 2.0,
jitter: JitterStrategy.None)
});
```

### Step semantics

Control how a step behaves when interrupted mid-execution:

```csharp
public enum StepSemantics
{
AtLeastOncePerRetry, // default — body may re-execute if Lambda is re-invoked mid-attempt
AtMostOncePerRetry // body executes at most once per retry attempt
}
```

| Semantic | Behavior | Use case |
| --- | --- | --- |
| `AtLeastOncePerRetry` (default) | Re-executes the step if interrupted before completion. | Idempotent operations (database upserts, API calls with idempotency keys). |
| `AtMostOncePerRetry` | Never re-executes; throws if interrupted. | Non-idempotent operations (sending email, charging payments). |

These semantics apply *per retry attempt*, not per overall execution. To achieve true at-most-once across the whole workflow, combine with `RetryStrategy.None`:

```csharp
var result = await ctx.StepAsync(
async _ => await paymentService.ChargeAsync(amount),
name: "charge-payment",
config: new StepConfig
{
Semantics = StepSemantics.AtMostOncePerRetry,
RetryStrategy = RetryStrategy.None
});
```
Loading