Kolya BR Proxy

AI Gateway providing OpenAI & Anthropic compatible APIs backed by AWS Bedrock

中文 · Architecture · Deployment · API Reference

Why Kolya BR Proxy?


Dual API, one key	Both OpenAI (`/v1/chat/completions`) and Anthropic (`/v1/messages`) endpoints. Same `sk-ant-api03_` key works everywhere — Cursor, Cline, Claude Code, OpenAI SDK.
Up to 90% cost savings	Prompt caching reads at 0.1x price. Agent loops save ~60% after just 2 requests.
Enterprise security	3-layer CSRF, AWS WAF, SHA256 + AES-128 token protection, OAuth SSO (Cognito / Entra ID).
Production-ready	Distributed Redis rate limiting, HPA autoscaling (1-10 Pods), streaming heartbeat, Karpenter node scaling.

Screenshots

More screenshots

Architecture

graph LR
    Client["Client (Cursor / Claude Code / SDK)"] -->|OpenAI or Anthropic API| Backend["Backend (FastAPI)"]
    Frontend["Frontend (Vue 3 + Quasar)"] -->|Admin API| Backend
    Backend -->|InvokeModel - Claude| Bedrock["AWS Bedrock"]
    Backend -->|Converse API - Nova / DeepSeek| Bedrock
    Backend -->|asyncpg| DB[(PostgreSQL)]
    Backend -.->|Cache + Rate Limit| Redis[(Redis)]
    subgraph AWS EKS
        Frontend
        Backend
    end

Dual API routing — Clients choose their preferred format:

Endpoint	Auth	Format	Clients
`POST /v1/chat/completions`	`Authorization: Bearer`	OpenAI	Cursor, Cline, OpenAI SDK
`POST /v1/messages`	`x-api-key`	Anthropic Messages	Claude Code, Anthropic SDK
`GET /v1/models`	Both	OpenAI	All clients

Both routes go through the same Bedrock backend, token validation, quota tracking, and prompt caching pipeline.

Quick Start

Prerequisites

Python 3.12+, Node.js 18+, PostgreSQL 15+ (or Docker)
AWS credentials with Bedrock access
uv package manager

1. Database

docker run -d --name kbp-postgres \
  -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \
  -e POSTGRES_DB=kolyabrproxy -p 5432:5432 postgres:15

2. Backend

cd backend
uv sync
cp .env.example .env        # edit with your values
uv run alembic upgrade head
KBR_ENV=local uv run python main.py

Backend runs at http://localhost:8000 (Swagger UI at /docs when KBR_DEBUG=true).

3. Frontend

cd frontend && npm install && npm run dev

Frontend runs at http://localhost:9000.

4. Test

# OpenAI-compatible
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-ant-api03_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"us.anthropic.claude-sonnet-4-20250514-v1:0","messages":[{"role":"user","content":"Hi"}],"stream":true}'

# Anthropic-compatible
curl http://localhost:8000/v1/messages \
  -H "x-api-key: sk-ant-api03_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"us.anthropic.claude-sonnet-4-20250514-v1:0","max_tokens":1024,"messages":[{"role":"user","content":"Hi"}],"stream":true}'

Client Configuration

Claude Code

export ANTHROPIC_BASE_URL=https://api.your-domain.com/v1
export ANTHROPIC_API_KEY=sk-ant-api03_YOUR_TOKEN

Claude Code will auto-discover models via /v1/models and send requests to /v1/messages.

Cursor / Cline

Setting	Value
Base URL	`https://api.your-domain.com/v1`
API Key	`sk-ant-api03_YOUR_TOKEN`
Model	`us.anthropic.claude-sonnet-4-20250514-v1:0`

OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(
    api_key="sk-ant-api03_YOUR_TOKEN",  # pragma: allowlist secret
    base_url="https://api.your-domain.com/v1",
)

response = client.chat.completions.create(
    model="us.anthropic.claude-sonnet-4-20250514-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Anthropic SDK (Python)

import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-api03_YOUR_TOKEN",  # pragma: allowlist secret
    base_url="https://api.your-domain.com/v1",
)

message = client.messages.create(
    model="us.anthropic.claude-sonnet-4-20250514-v1:0",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Key Features

Dual API Gateway

OpenAI compatible — /v1/chat/completions, /v1/models
Anthropic compatible — /v1/messages with full thinking, adaptive mode, tool use support
Streaming and non-streaming with 15s heartbeat keep-alive
Multi-modal (text + images), tool calling, extended thinking

Multi-Provider Support

Anthropic Claude via native InvokeModel API (thinking, effort, prompt caching)
Amazon Nova, DeepSeek, Mistral, Llama via Converse API
19 providers through unified translation layer

Cost Optimization

Prompt caching — 90% discount on reads, auto-injection of cache breakpoints (up to 4 per request)
Per-token billing — Dynamic pricing from AWS API (181+ regional pricing records)
Real-time tracking — Background async usage recording with per-token quota limits

Security

API tokens: SHA256 hash index for O(1) lookup + Fernet AES-128 encrypted storage
OAuth SSO: Cognito (default) + Microsoft Entra ID, PKCE + HttpOnly refresh cookies
CSRF: Origin + Referer + custom header triple validation
WAF: Rate limiting (20/300/2000 req per 5min by tier), SQLi/XSS managed rules
Secrets: External Secrets Operator + AWS Secrets Manager, auto-sync via Pod Identity

Infrastructure

Kubernetes-native: EKS + Karpenter + Metrics Server
Two deployment modes: full IaC (deploy-all.sh) or existing cluster (deploy-to-existing.sh)
Optional Global Accelerator for Anycast low-latency routing
Distributed Redis token bucket rate limiting with per-Pod fallback

Tech Stack

Layer	Technology
Frontend	Vue 3, Quasar, TypeScript, Pinia, Vite
Backend	Python 3.12+, FastAPI, SQLAlchemy (async), Alembic, Pydantic
Database	PostgreSQL (Aurora in prod), asyncpg
Cache	Redis (rate limiting, token caching)
Auth	JWT, AWS Cognito, Microsoft OAuth
Cloud	AWS Bedrock, EKS, ECR, WAF, Secrets Manager
IaC	Terraform, Karpenter, External Secrets Operator

Documentation

Document	Description
Architecture	System overview, component diagrams, auth flows
API Reference	Full endpoint docs with examples
Request Translation	OpenAI/Anthropic to Bedrock format mapping
Prompt Caching	Auto-injection, cost model, breakpoint strategy
Pricing System	Per-token billing, dynamic pricing
Security	CSRF, WAF, token protection, OAuth
Performance	Streaming, rate limiting, timeout tuning, HPA
Deployment SOP	Deploy, teardown, and operations
OAuth Setup	Cognito & Microsoft OAuth configuration

Development

# Backend
cd backend
uv run ruff check .     # lint
uv run ruff format .    # format
uv run pytest           # test

# Frontend
cd frontend
npm run lint            # lint
npm run format          # format

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.claude/skills/gitnexus		.claude/skills/gitnexus
.github/workflows		.github/workflows
assets		assets
backend		backend
docs		docs
frontend		frontend
functional_tests		functional_tests
iac		iac
k8s		k8s
kolya-br-proxy-arch		kolya-br-proxy-arch
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.gitleaksignore		.gitleaksignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.secrets.baseline		.secrets.baseline
.semgrepignore		.semgrepignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
build-and-push.sh		build-and-push.sh
deploy-all.sh		deploy-all.sh
deploy-to-existing.sh		deploy-to-existing.sh
destroy.sh		destroy.sh
pyproject.toml		pyproject.toml
semgrep-results.json		semgrep-results.json
test-cache.json		test-cache.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kolya BR Proxy

Why Kolya BR Proxy?

Screenshots

Architecture

Quick Start

Prerequisites

1. Database

2. Backend

3. Frontend

4. Test

Client Configuration

Claude Code

Cursor / Cline

OpenAI SDK (Python)

Anthropic SDK (Python)

Key Features

Dual API Gateway

Multi-Provider Support

Cost Optimization

Security

Infrastructure

Tech Stack

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kolya BR Proxy

Why Kolya BR Proxy?

Screenshots

Architecture

Quick Start

Prerequisites

1. Database

2. Backend

3. Frontend

4. Test

Client Configuration

Claude Code

Cursor / Cline

OpenAI SDK (Python)

Anthropic SDK (Python)

Key Features

Dual API Gateway

Multi-Provider Support

Cost Optimization

Security

Infrastructure

Tech Stack

Documentation

Development

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages