Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Chart.lock
**/secrets.yml
values-local.yaml
values-local.yml
values-*.yaml
values-*.yml

# Helm output and temporary files
*.tmp
Expand All @@ -31,4 +33,3 @@ test-output/
manifests/
rendered/
debug/

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ helm upgrade --install \

## Prerequisites

Before installing the Braintrust Helm chart, ensure you have run the appropriate braintrust terraform module [Google](https://github.com/braintrustdata/terraform-google-braintrust-data-plane) or [Azure](https://github.com/braintrustdata/terraform-azure-braintrust-data-plane) to deploy the base infrastructure.
Before installing the Braintrust Helm chart, ensure you have run the appropriate Braintrust Terraform module for [AWS](https://github.com/braintrustdata/terraform-aws-braintrust-data-plane), [Google](https://github.com/braintrustdata/terraform-google-braintrust-data-plane), or [Azure](https://github.com/braintrustdata/terraform-azure-braintrust-data-plane) to deploy the base infrastructure.

See the [Braintrust Helm Chart](./braintrust/README.md) for more details.
29 changes: 29 additions & 0 deletions braintrust/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,17 @@ brainstore:

**Supported machine families:** c4, c4d

If you need the request to cover more than the cache volume alone, set an explicit total pod-local storage budget:

```yaml
brainstore:
reader:
volume:
size: "900Gi"
ephemeralStorage:
request: "905Gi" # cache + /tmp (if enabled) + logs/writable-layer overhead
```

### GKE Standard Mode

For Standard mode clusters, create node pools with local SSDs, then deploy:
Expand Down Expand Up @@ -147,6 +158,18 @@ For Standard mode clusters, create node pools with local SSDs, then deploy:
- Local SSDs are automatically available via emptyDir volumes
- Pod anti-affinity ensures readers and writers don't share nodes (each pod gets dedicated node access)

## AWS EKS Local Storage

On EKS, Brainstore uses Kubernetes-managed `emptyDir` volumes for cache storage. To make scheduling reflect the real local-disk budget, set `brainstore.<role>.ephemeralStorage.request` for each Brainstore role.

Size the request for the pod's full local-storage usage:
- cache `emptyDir`
- optional `/tmp` `emptyDir`
- container logs
- writable layer overhead

When you enable `tmpVolume`, make sure the `ephemeralStorage.request` still covers that extra space.

## Testing

This Helm chart includes comprehensive automated unit tests.
Expand Down Expand Up @@ -192,3 +215,9 @@ This version also adds first-class `brainstoreWalFooterVersion` support and auto
## Example Values Files

Example values files for different cloud providers and configurations are located in the `examples/` folder.

- `examples/aws-eks/values.yaml`: AWS EKS deployment without a quarantine VPC. User-defined functions execute in the API pod. Includes the API service annotations needed for the Terraform-managed CloudFront plus adopted internal NLB path.
- `examples/aws-eks-quarantine/values.yaml`: AWS EKS deployment with user-defined functions routed into the quarantine VPC. Includes the API service annotations needed for the Terraform-managed CloudFront plus adopted internal NLB path.
- `examples/google-autopilot/values.yaml`: GKE Autopilot deployment.
- `examples/google-autopilot-cel/values.yaml`: GKE Autopilot deployment with CEL-friendly security settings.
- `examples/google-standard/values.yaml`: GKE Standard deployment.
67 changes: 67 additions & 0 deletions braintrust/examples/aws-eks-cel/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# CEL-friendly overlay for AWS EKS deployments.
#
# Use this together with the Terraform-generated EKS values file, for example:
# helm upgrade --install braintrust ./braintrust \
# --namespace braintrust \
# --values /path/to/braintrust-generated-values.yaml \
# --values ./braintrust/examples/aws-eks-cel/values.yaml
#
# This file intentionally does not repeat AWS-specific service account, bucket,
# or NLB settings. Those should continue to come from the Terraform-generated
# values so the chart stays aligned with the cluster infrastructure.

cloud: "aws"

api:
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
tmpVolume:
enabled: true
sizeLimit: "1Gi"

brainstore:
reader:
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
ephemeralStorage:
# Include cache, this /tmp volume, logs, and writable layer overhead.
request: "<your reader total pod-local storage budget>"
tmpVolume:
enabled: true
sizeLimit: "1Gi"

fastreader:
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
ephemeralStorage:
# Include cache, this /tmp volume, logs, and writable layer overhead.
request: "<your fastreader total pod-local storage budget>"
tmpVolume:
enabled: true
sizeLimit: "1Gi"

writer:
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
ephemeralStorage:
# Include cache, this /tmp volume, logs, and writable layer overhead.
request: "<your writer total pod-local storage budget>"
tmpVolume:
enabled: true
sizeLimit: "1Gi"
118 changes: 118 additions & 0 deletions braintrust/examples/aws-eks-quarantine/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Sample values for AWS EKS deployment with a quarantine VPC

global:
orgName: "<your Braintrust org name>"
namespace: "braintrust"

cloud: "aws"

objectStorage:
aws:
brainstoreBucket: "<your brainstore bucket name>"
responseBucket: "<your response bucket name>"
codeBundleBucket: "<your code bundle bucket name>"

api:
name: "braintrust-api"
replicas: 1
# Disable in-pod code execution so user-defined functions run in the quarantine VPC.
allowCodeFunctionExecution: false
annotations:
service:
# Internal NLB via the AWS Load Balancer Controller.
# If you are using the terraform-aws-braintrust-data-plane EKS CloudFront path,
# set these so the controller adopts the pre-created internal NLB.
service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "instance"
service.beta.kubernetes.io/aws-load-balancer-security-groups: "<your pre-created NLB security group ID>"
service.beta.kubernetes.io/aws-load-balancer-name: "<your pre-created NLB name>"
service:
type: LoadBalancer
port: 8000
portName: http
serviceAccount:
name: "braintrust-api"
awsRoleArn: "<your Braintrust API IAM role ARN>"
resources:
requests:
cpu: "4"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"
extraEnvVars:
- name: QUARANTINE_INVOKE_ROLE
value: "<your quarantine invoke role ARN>"
- name: QUARANTINE_FUNCTION_ROLE
value: "<your quarantine function role ARN>"
- name: QUARANTINE_REGION
value: "<your AWS region>"
- name: QUARANTINE_PRIVATE_SUBNET_1_ID
value: "<your quarantine private subnet 1 ID>"
- name: QUARANTINE_PRIVATE_SUBNET_2_ID
value: "<your quarantine private subnet 2 ID>"
- name: QUARANTINE_PRIVATE_SUBNET_3_ID
value: "<your quarantine private subnet 3 ID>"
- name: QUARANTINE_PUB_PRIVATE_VPC_DEFAULT_SECURITY_GROUP
value: "<your quarantine Lambda security group ID>"
- name: QUARANTINE_PUB_PRIVATE_VPC_ID
value: "<your quarantine VPC ID>"
# nodeSelector:
# topology.kubernetes.io/zone: us-east-1a

brainstore:
serviceAccount:
name: "brainstore"
awsRoleArn: "<your Brainstore IAM role ARN>"
reader:
name: "brainstore-reader"
replicas: 2
service:
type: ClusterIP
port: 4000
portName: http
resources:
requests:
cpu: "16"
memory: "32Gi"
limits:
cpu: "16"
memory: "32Gi"
ephemeralStorage:
# Total pod-local storage budget for cache, optional /tmp, logs, and writable layers.
request: "<your reader total pod-local storage budget>"
fastreader:
name: "brainstore-fastreader"
replicas: 2
service:
type: ClusterIP
port: 4000
portName: http
resources:
requests:
cpu: "16"
memory: "32Gi"
limits:
cpu: "16"
memory: "32Gi"
ephemeralStorage:
# Total pod-local storage budget for cache, optional /tmp, logs, and writable layers.
request: "<your fastreader total pod-local storage budget>"
writer:
name: "brainstore-writer"
replicas: 1
service:
type: ClusterIP
port: 4000
portName: http
resources:
requests:
cpu: "32"
memory: "64Gi"
limits:
cpu: "32"
memory: "64Gi"
ephemeralStorage:
# Total pod-local storage budget for cache, optional /tmp, logs, and writable layers.
request: "<your writer total pod-local storage budget>"
99 changes: 99 additions & 0 deletions braintrust/examples/aws-eks/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Sample values for AWS EKS deployment without a quarantine VPC

global:
orgName: "<your Braintrust org name>"
namespace: "braintrust"

cloud: "aws"

objectStorage:
aws:
brainstoreBucket: "<your brainstore bucket name>"
responseBucket: "<your response bucket name>"
codeBundleBucket: "<your code bundle bucket name>"

api:
name: "braintrust-api"
annotations:
service:
# Internal NLB via the AWS Load Balancer Controller.
# If you are using the terraform-aws-braintrust-data-plane EKS CloudFront path,
# set these so the controller adopts the pre-created internal NLB.
service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "instance"
service.beta.kubernetes.io/aws-load-balancer-security-groups: "<your pre-created NLB security group ID>"
service.beta.kubernetes.io/aws-load-balancer-name: "<your pre-created NLB name>"
replicas: 1
service:
type: LoadBalancer
port: 8000
portName: http
serviceAccount:
name: "braintrust-api"
awsRoleArn: "<your Braintrust API IAM role ARN>"
# Keep code execution enabled when not using a quarantine VPC.
allowCodeFunctionExecution: true
resources:
requests:
cpu: "4"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"

brainstore:
serviceAccount:
name: "brainstore"
awsRoleArn: "<your Brainstore IAM role ARN>"
reader:
name: "brainstore-reader"
replicas: 2
service:
type: ClusterIP
port: 4000
portName: http
resources:
requests:
cpu: "16"
memory: "32Gi"
limits:
cpu: "16"
memory: "32Gi"
ephemeralStorage:
# Total pod-local storage budget for cache, optional /tmp, logs, and writable layers.
request: "<your reader total pod-local storage budget>"
fastreader:
name: "brainstore-fastreader"
replicas: 2
service:
type: ClusterIP
port: 4000
portName: http
resources:
requests:
cpu: "16"
memory: "32Gi"
limits:
cpu: "16"
memory: "32Gi"
ephemeralStorage:
# Total pod-local storage budget for cache, optional /tmp, logs, and writable layers.
request: "<your fastreader total pod-local storage budget>"
writer:
name: "brainstore-writer"
replicas: 1
service:
type: ClusterIP
port: 4000
portName: http
resources:
requests:
cpu: "32"
memory: "64Gi"
limits:
cpu: "32"
memory: "64Gi"
ephemeralStorage:
# Total pod-local storage budget for cache, optional /tmp, logs, and writable layers.
request: "<your writer total pod-local storage budget>"
Loading
Loading