Skip to content

DIARCHERS-1396: MCP-dedicated API layer with rate limiting, auth, and audit#610

Open
sap-yuan wants to merge 7 commits into
masterfrom
diarchers-1396-mcp-api-layer
Open

DIARCHERS-1396: MCP-dedicated API layer with rate limiting, auth, and audit#610
sap-yuan wants to merge 7 commits into
masterfrom
diarchers-1396-mcp-api-layer

Conversation

@sap-yuan

@sap-yuan sap-yuan commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements DIARCHERS-1396: isolates all MCP/AI-agent API calls under a dedicated /api/v1/mcp/* namespace, with independent rate limiting, scoped bearer tokens, and audit logging — mirroring the dhaas-control-center pattern.

  • New DB migration (00047.sql): mcp_token and mcp_access_log tables
  • src/api/handlers/mcp/auth.py: ib_mcp_* bearer token validation; check_project_access_mcp and check_trigger_access_mcp helpers
  • src/api/handlers/mcp/rate_limit.py: Redis sliding-window rate limiter (per-user per-endpoint, fail-open on Redis outage)
  • src/api/handlers/mcp/audit.py: fire-and-forget audit logging into mcp_access_log
  • src/api/handlers/mcp/token_routes.py: token CRUD at POST/GET/PATCH/DELETE /api/v1/mcp/tokens/* (session auth)
  • src/api/handlers/mcp/routes/: /api/v1/mcp/* endpoints for projects, builds, jobs, logs, artifacts, trigger
  • infrabox/test/api/mcp_test.py: 21 unit tests covering token hash, access checks, rate limiter (allow/deny/fail-open)

Global token OPA policy fix

Global tokens were introduced to allow AI agents to access multiple projects with a single token. However, the OPA policies only covered project list/detail — all project-scoped read endpoints returned 401.

This commit adds the missing allow rules:

  • projects_build.rego — builds list, single build, sub-paths; writes (restart, abort, cache-clear) gated on scope_push=true
  • projects_jobs.rego — all job sub-resources (console, output, testresults, testruns, tests/history, tabs, archive, badges, stats); writes gated on scope_push=true
  • projects_commits.rego — read commits for collaborators; fixes latent bug where helper referenced undefined project instead of project_id
  • projects_cronjobs.rego — read-only cron job list for collaborators with administrator role
  • trigger.rego — trigger gated on scope_push=true; fixes same variable name bug
  • global_viewer_test.rego — 9 new test cases: allowed reads, denied non-collaborator, denied writes without scope_push, allowed writes with scope_push

Authorization model: global token reads any endpoint under /api/v1/projects/<id>/... iff user_id is a collaborator. Writes require scope_push=true. Admin, secrets, SSH keys, project tokens remain inaccessible. No Python or DB changes needed.


Changes

  • Zero impact on existing /api/v1/* endpoints — MCP tokens are blocked from all non-MCP paths
  • Per-user Redis sliding window with configurable RPM limits per endpoint (log/artifact: 10, trigger: 5, default: 30)
  • Project-scoped tokens with per-project expiry in JSONB; trigger requires explicit allow_trigger=true
  • Middleware order: auth → rate limit → project access → trigger access → handler → audit

Testing

cd infrabox/test/api
PYTHONPATH=../../src python -m pytest mcp_test.py -v   # 21 passed

OPA unit tests:

opa test src/openpolicyagent/policies/

JIRA

DIARCHERS-1396

@sap-yuan sap-yuan self-assigned this Jun 5, 2026
Yuan Huang added 2 commits June 9, 2026 12:07
… and audit

- DB migration 00046: mcp_token and mcp_access_log tables
- api/handlers/mcp/auth.py: ib_mcp_* bearer token validation, project/trigger access checks
- api/handlers/mcp/rate_limit.py: Redis sliding-window per-user per-endpoint rate limiter (fail-open)
- api/handlers/mcp/audit.py: fire-and-forget audit logging to mcp_access_log
- api/handlers/mcp/token_routes.py: token CRUD at /api/v1/mcp/tokens/*
- api/handlers/mcp/routes/: /api/v1/mcp/* endpoints for projects, builds, jobs, artifacts, trigger
- infrabox/test/api/mcp_test.py: 21 unit tests covering hash, access checks, rate limiter
@sap-yuan sap-yuan force-pushed the diarchers-1396-mcp-api-layer branch from 5244b9e to 051d279 Compare June 9, 2026 04:07
Yuan Huang added 5 commits June 9, 2026 12:35
- ibflask.py: recognize ib_mcp_ prefix in get_token() to skip JWT
  decode; normalize_token() passes mcp type through unchanged
- policies/mcp.rego: add OPA rules allowing mcp tokens on
  /api/v1/mcp/* data paths and session users on tokens management path
- auth.py: fix per-project expiry check to handle naive vs aware
  datetime; treat malformed expiry as denied instead of granted
- routes/builds.py: add LOCK TABLE before MAX(build_number)+1 INSERT
  to prevent duplicate build numbers under concurrency; move uuid
  import to top; drop premature audit 'attempt' before lock
- routes/projects.py: add collaborator JOIN on MCP token path so
  revoked collaborators cannot enumerate project metadata
- rate_limit.py: reset _redis_client to None on pipeline error so
  reconnect is attempted on the next request
- token_routes.py: replace inline hashlib.sha256 with _hash_token();
  move json/hashlib imports to top; fix MCPTokenTrigger.post docstring
  (permanent grant, not time-limited); remove dead body variable;
  add try/except around int(expires_days) conversion
- audit.py: remove unused threading import; fix misleading docstring;
  move json import to top
- 00046.sql: add retention comment for mcp_access_log pruning
…onify()

flask-restx's make_response wrapper cannot accept a Flask Response
object as data — it tries to JSON-serialize it and raises TypeError.
All mcp handler methods must return plain dict/list (+ optional status
code), not jsonify() Response objects.
flask-restx make_response cannot accept Flask Response objects;
_reject() must raise via abort() so the error handler formats the
response, not the Resource wrapper.
The test environment DB already has schema_version=46 from a prior
development iteration (global_token columns) that was later consolidated
into 00045.sql. Keeping mcp_token/mcp_access_log as 00046 causes the
migration runner to skip it. Renaming to 00047 ensures it runs on any
environment where 00046 is already marked as applied.

mcp.rego is automatically included in the OPA image via the existing
Dockerfile COPY instruction (src/openpolicyagent/policies → /policies),
so no separate deployment step is needed.
…gger

Global tokens were only authorized for project list/detail endpoints.
All project-scoped read endpoints (builds, jobs, console, testresults,
archive, stats, etc.) returned 401 because no OPA allow rules existed
for token.type = "global".

This commit adds the missing allow rules across all affected policy
files, following the same collaborator-check pattern already used for
user tokens:

- Read operations (builds, jobs and all sub-resources): allowed when
  global_token.user_id is a collaborator on the target project.
- Write operations (restart, abort, rerun, cache-clear, trigger):
  additionally require global_token.scope_push = true.

Also fixes a latent bug in projects_commits.rego and trigger.rego
where the helper functions referenced an undefined `project` variable
instead of the correct `project_id` parameter.

New OPA test cases in global_viewer_test.rego cover:
- builds list and single build allowed for collaborator
- job console and stats allowed for collaborator
- read denied for non-collaborator project
- restart/trigger denied without scope_push
- restart/trigger allowed with scope_push
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant