AI Gateway providing OpenAI & Anthropic compatible APIs backed by AWS Bedrock
中文 · Architecture · Deployment · API Reference
| Dual API, one key | Both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) endpoints. Same sk-ant-api03_ key works everywhere — Cursor, Cline, Claude Code, OpenAI SDK. |
| Up to 90% cost savings | Prompt caching reads at 0.1x price. Agent loops save ~60% after just 2 requests. |
| Enterprise security | 3-layer CSRF, AWS WAF, SHA256 + AES-128 token protection, OAuth SSO (Cognito / Entra ID). |
| Production-ready | Distributed Redis rate limiting, HPA autoscaling (1-10 Pods), streaming heartbeat, Karpenter node scaling. |
graph LR
Client["Client (Cursor / Claude Code / SDK)"] -->|OpenAI or Anthropic API| Backend["Backend (FastAPI)"]
Frontend["Frontend (Vue 3 + Quasar)"] -->|Admin API| Backend
Backend -->|InvokeModel - Claude| Bedrock["AWS Bedrock"]
Backend -->|Converse API - Nova / DeepSeek| Bedrock
Backend -->|asyncpg| DB[(PostgreSQL)]
Backend -.->|Cache + Rate Limit| Redis[(Redis)]
subgraph AWS EKS
Frontend
Backend
end
Dual API routing — Clients choose their preferred format:
| Endpoint | Auth | Format | Clients |
|---|---|---|---|
POST /v1/chat/completions |
Authorization: Bearer |
OpenAI | Cursor, Cline, OpenAI SDK |
POST /v1/messages |
x-api-key |
Anthropic Messages | Claude Code, Anthropic SDK |
GET /v1/models |
Both | OpenAI | All clients |
Both routes go through the same Bedrock backend, token validation, quota tracking, and prompt caching pipeline.
- Python 3.12+, Node.js 18+, PostgreSQL 15+ (or Docker)
- AWS credentials with Bedrock access
- uv package manager
docker run -d --name kbp-postgres \
-e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=kolyabrproxy -p 5432:5432 postgres:15cd backend
uv sync
cp .env.example .env # edit with your values
uv run alembic upgrade head
KBR_ENV=local uv run python main.pyBackend runs at http://localhost:8000 (Swagger UI at /docs when KBR_DEBUG=true).
cd frontend && npm install && npm run devFrontend runs at http://localhost:9000.
# OpenAI-compatible
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-ant-api03_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"us.anthropic.claude-sonnet-4-20250514-v1:0","messages":[{"role":"user","content":"Hi"}],"stream":true}'
# Anthropic-compatible
curl http://localhost:8000/v1/messages \
-H "x-api-key: sk-ant-api03_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"us.anthropic.claude-sonnet-4-20250514-v1:0","max_tokens":1024,"messages":[{"role":"user","content":"Hi"}],"stream":true}'export ANTHROPIC_BASE_URL=https://api.your-domain.com/v1
export ANTHROPIC_API_KEY=sk-ant-api03_YOUR_TOKENClaude Code will auto-discover models via /v1/models and send requests to /v1/messages.
| Setting | Value |
|---|---|
| Base URL | https://api.your-domain.com/v1 |
| API Key | sk-ant-api03_YOUR_TOKEN |
| Model | us.anthropic.claude-sonnet-4-20250514-v1:0 |
from openai import OpenAI
client = OpenAI(
api_key="sk-ant-api03_YOUR_TOKEN", # pragma: allowlist secret
base_url="https://api.your-domain.com/v1",
)
response = client.chat.completions.create(
model="us.anthropic.claude-sonnet-4-20250514-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)import anthropic
client = anthropic.Anthropic(
api_key="sk-ant-api03_YOUR_TOKEN", # pragma: allowlist secret
base_url="https://api.your-domain.com/v1",
)
message = client.messages.create(
model="us.anthropic.claude-sonnet-4-20250514-v1:0",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)- OpenAI compatible —
/v1/chat/completions,/v1/models - Anthropic compatible —
/v1/messageswith full thinking, adaptive mode, tool use support - Streaming and non-streaming with 15s heartbeat keep-alive
- Multi-modal (text + images), tool calling, extended thinking
- Anthropic Claude via native InvokeModel API (thinking, effort, prompt caching)
- Amazon Nova, DeepSeek, Mistral, Llama via Converse API
- 19 providers through unified translation layer
- Prompt caching — 90% discount on reads, auto-injection of cache breakpoints (up to 4 per request)
- Per-token billing — Dynamic pricing from AWS API (181+ regional pricing records)
- Real-time tracking — Background async usage recording with per-token quota limits
- API tokens: SHA256 hash index for O(1) lookup + Fernet AES-128 encrypted storage
- OAuth SSO: Cognito (default) + Microsoft Entra ID, PKCE + HttpOnly refresh cookies
- CSRF: Origin + Referer + custom header triple validation
- WAF: Rate limiting (20/300/2000 req per 5min by tier), SQLi/XSS managed rules
- Secrets: External Secrets Operator + AWS Secrets Manager, auto-sync via Pod Identity
- Kubernetes-native: EKS + Karpenter + Metrics Server
- Two deployment modes: full IaC (
deploy-all.sh) or existing cluster (deploy-to-existing.sh) - Optional Global Accelerator for Anycast low-latency routing
- Distributed Redis token bucket rate limiting with per-Pod fallback
| Layer | Technology |
|---|---|
| Frontend | Vue 3, Quasar, TypeScript, Pinia, Vite |
| Backend | Python 3.12+, FastAPI, SQLAlchemy (async), Alembic, Pydantic |
| Database | PostgreSQL (Aurora in prod), asyncpg |
| Cache | Redis (rate limiting, token caching) |
| Auth | JWT, AWS Cognito, Microsoft OAuth |
| Cloud | AWS Bedrock, EKS, ECR, WAF, Secrets Manager |
| IaC | Terraform, Karpenter, External Secrets Operator |
| Document | Description |
|---|---|
| Architecture | System overview, component diagrams, auth flows |
| API Reference | Full endpoint docs with examples |
| Request Translation | OpenAI/Anthropic to Bedrock format mapping |
| Prompt Caching | Auto-injection, cost model, breakpoint strategy |
| Pricing System | Per-token billing, dynamic pricing |
| Security | CSRF, WAF, token protection, OAuth |
| Performance | Streaming, rate limiting, timeout tuning, HPA |
| Deployment SOP | Deploy, teardown, and operations |
| OAuth Setup | Cognito & Microsoft OAuth configuration |
# Backend
cd backend
uv run ruff check . # lint
uv run ruff format . # format
uv run pytest # test
# Frontend
cd frontend
npm run lint # lint
npm run format # formatMIT License — see LICENSE for details.





