Skip to content

aws-samples/sample-kolya-br-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Kolya BR Proxy

Kolya BR Proxy

AI Gateway providing OpenAI & Anthropic compatible APIs backed by AWS Bedrock

中文 · Architecture · Deployment · API Reference

Python FastAPI Vue 3 AWS Bedrock License


Why Kolya BR Proxy?

Dual API, one key Both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) endpoints. Same sk-ant-api03_ key works everywhere — Cursor, Cline, Claude Code, OpenAI SDK.
Up to 90% cost savings Prompt caching reads at 0.1x price. Agent loops save ~60% after just 2 requests.
Enterprise security 3-layer CSRF, AWS WAF, SHA256 + AES-128 token protection, OAuth SSO (Cognito / Entra ID).
Production-ready Distributed Redis rate limiting, HPA autoscaling (1-10 Pods), streaming heartbeat, Karpenter node scaling.

Screenshots

Dashboard

More screenshots
API Keys Models
Playground Monitoring

Architecture

graph LR
    Client["Client (Cursor / Claude Code / SDK)"] -->|OpenAI or Anthropic API| Backend["Backend (FastAPI)"]
    Frontend["Frontend (Vue 3 + Quasar)"] -->|Admin API| Backend
    Backend -->|InvokeModel - Claude| Bedrock["AWS Bedrock"]
    Backend -->|Converse API - Nova / DeepSeek| Bedrock
    Backend -->|asyncpg| DB[(PostgreSQL)]
    Backend -.->|Cache + Rate Limit| Redis[(Redis)]
    subgraph AWS EKS
        Frontend
        Backend
    end
Loading

Dual API routing — Clients choose their preferred format:

Endpoint Auth Format Clients
POST /v1/chat/completions Authorization: Bearer OpenAI Cursor, Cline, OpenAI SDK
POST /v1/messages x-api-key Anthropic Messages Claude Code, Anthropic SDK
GET /v1/models Both OpenAI All clients

Both routes go through the same Bedrock backend, token validation, quota tracking, and prompt caching pipeline.


Quick Start

Prerequisites

  • Python 3.12+, Node.js 18+, PostgreSQL 15+ (or Docker)
  • AWS credentials with Bedrock access
  • uv package manager

1. Database

docker run -d --name kbp-postgres \
  -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \
  -e POSTGRES_DB=kolyabrproxy -p 5432:5432 postgres:15

2. Backend

cd backend
uv sync
cp .env.example .env        # edit with your values
uv run alembic upgrade head
KBR_ENV=local uv run python main.py

Backend runs at http://localhost:8000 (Swagger UI at /docs when KBR_DEBUG=true).

3. Frontend

cd frontend && npm install && npm run dev

Frontend runs at http://localhost:9000.

4. Test

# OpenAI-compatible
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-ant-api03_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"us.anthropic.claude-sonnet-4-20250514-v1:0","messages":[{"role":"user","content":"Hi"}],"stream":true}'

# Anthropic-compatible
curl http://localhost:8000/v1/messages \
  -H "x-api-key: sk-ant-api03_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"us.anthropic.claude-sonnet-4-20250514-v1:0","max_tokens":1024,"messages":[{"role":"user","content":"Hi"}],"stream":true}'

Client Configuration

Claude Code

export ANTHROPIC_BASE_URL=https://api.your-domain.com/v1
export ANTHROPIC_API_KEY=sk-ant-api03_YOUR_TOKEN

Claude Code will auto-discover models via /v1/models and send requests to /v1/messages.

Cursor / Cline

Setting Value
Base URL https://api.your-domain.com/v1
API Key sk-ant-api03_YOUR_TOKEN
Model us.anthropic.claude-sonnet-4-20250514-v1:0

OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(
    api_key="sk-ant-api03_YOUR_TOKEN",  # pragma: allowlist secret
    base_url="https://api.your-domain.com/v1",
)

response = client.chat.completions.create(
    model="us.anthropic.claude-sonnet-4-20250514-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Anthropic SDK (Python)

import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-api03_YOUR_TOKEN",  # pragma: allowlist secret
    base_url="https://api.your-domain.com/v1",
)

message = client.messages.create(
    model="us.anthropic.claude-sonnet-4-20250514-v1:0",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Key Features

Dual API Gateway

  • OpenAI compatible/v1/chat/completions, /v1/models
  • Anthropic compatible/v1/messages with full thinking, adaptive mode, tool use support
  • Streaming and non-streaming with 15s heartbeat keep-alive
  • Multi-modal (text + images), tool calling, extended thinking

Multi-Provider Support

  • Anthropic Claude via native InvokeModel API (thinking, effort, prompt caching)
  • Amazon Nova, DeepSeek, Mistral, Llama via Converse API
  • 19 providers through unified translation layer

Cost Optimization

  • Prompt caching — 90% discount on reads, auto-injection of cache breakpoints (up to 4 per request)
  • Per-token billing — Dynamic pricing from AWS API (181+ regional pricing records)
  • Real-time tracking — Background async usage recording with per-token quota limits

Security

  • API tokens: SHA256 hash index for O(1) lookup + Fernet AES-128 encrypted storage
  • OAuth SSO: Cognito (default) + Microsoft Entra ID, PKCE + HttpOnly refresh cookies
  • CSRF: Origin + Referer + custom header triple validation
  • WAF: Rate limiting (20/300/2000 req per 5min by tier), SQLi/XSS managed rules
  • Secrets: External Secrets Operator + AWS Secrets Manager, auto-sync via Pod Identity

Infrastructure

  • Kubernetes-native: EKS + Karpenter + Metrics Server
  • Two deployment modes: full IaC (deploy-all.sh) or existing cluster (deploy-to-existing.sh)
  • Optional Global Accelerator for Anycast low-latency routing
  • Distributed Redis token bucket rate limiting with per-Pod fallback

Tech Stack

Layer Technology
Frontend Vue 3, Quasar, TypeScript, Pinia, Vite
Backend Python 3.12+, FastAPI, SQLAlchemy (async), Alembic, Pydantic
Database PostgreSQL (Aurora in prod), asyncpg
Cache Redis (rate limiting, token caching)
Auth JWT, AWS Cognito, Microsoft OAuth
Cloud AWS Bedrock, EKS, ECR, WAF, Secrets Manager
IaC Terraform, Karpenter, External Secrets Operator

Documentation

Document Description
Architecture System overview, component diagrams, auth flows
API Reference Full endpoint docs with examples
Request Translation OpenAI/Anthropic to Bedrock format mapping
Prompt Caching Auto-injection, cost model, breakpoint strategy
Pricing System Per-token billing, dynamic pricing
Security CSRF, WAF, token protection, OAuth
Performance Streaming, rate limiting, timeout tuning, HPA
Deployment SOP Deploy, teardown, and operations
OAuth Setup Cognito & Microsoft OAuth configuration

Development

# Backend
cd backend
uv run ruff check .     # lint
uv run ruff format .    # format
uv run pytest           # test

# Frontend
cd frontend
npm run lint            # lint
npm run format          # format

License

MIT License — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors