Technical Brief — OnlyAllowAI Riddle Firewall

01 · Executive Summary

What it is, in one paragraph.

A drop-in firewall for AI traffic. Replace your existing LLM endpoint URL with an OnlyAllowAI URL — nothing else changes in your application code. Behind the scenes, every request is tested against a competency contract you defined; passing requests stream through with negligible latency, failing ones are blocked with structured per-field feedback.

Why it matters to the business

1. You stop bad AI calls before they cost you money. Every upstream request to OpenAI / Anthropic / Groq / Google is billed by token. OnlyAllowAI denies the request before the upstream call is opened — you don't pay for blocked traffic.

2. You meet compliance without re-architecting. Every decision is persisted to Postgres with a stable event ID, the API key used, the riddle attempted, and the model targeted. SOC 2, ISO 27001, and internal audit teams can query the firewall_events table directly.

3. You keep one kill-switch for every AI agent. A single PATCH /v1/keys/<id> { "disabled": true } stops all traffic from that agent — instantly, across every Cloud Run replica, with no deploy.

02 · Architecture

Inline gate, decoupled observation.

The hot path runs the gate decision and the upstream stream on a single coroutine — no queues, no thread hops. The dashboard reads a separate event bus and can never stall a customer request.

🛰️

Inline, in-process

The gate decision runs on the same async coroutine that opens the upstream connection. No IPC, no queue, no second hop.

⚡

Zero buffering

Once the gate passes, the response is streamed byte-for-byte via httpx.aiter_raw() + FastAPI StreamingResponse.

🪟

Out-of-band telemetry

The Looking Glass dashboard reads a separate EventBus. Pull every browser tab — the proxy doesn't notice.

03 · Decision Model

Every request resolves to one of four outcomes.

Three of these allow the request through with zero buffering. The fourth blocks the request before the upstream LLM is contacted — so denied traffic costs you nothing in upstream tokens.

SPEED_PASS

Cached competence

The agent already cleared this domain; cert in cache.

< 1 ms · forward

NO_RIDDLE

No rule attached

The asset has no riddle defined — allow by default.

< 1 ms · forward (EXPOSED)

PASSED

Riddle solved

100% of expected fields correct; cert issued (TTL 1h).

5–50 ms · forward

DENIED

Riddle failed

Any field wrong → 403 with per-field feedback. Upstream never called.

5–50 ms · blocked

04 · Request Lifecycle

From bearer token to upstream byte — every step.

Every numbered step below runs on the same coroutine. Between step ⑤ (decision) and step ⑥ (forward) the proxy holds no lock, no database connection, and no Vault handle.

①

Resolve agent + domain

Auth dependency snapshots the API-key binding tuple (org_id, department_id, asset_id, riddle_id, provider, disabled) onto request.state.api_key_binding.

no DB hit on hot path

②

Disabled-key short circuit

If api_keys.disabled = true, immediately return 403 agent_disabled and emit proxy.blocked. No riddle is even pulled.

③

Rate limit check

Sliding-window ZSET in Redis: global 60 req/min + per-agent 30 req/min. In-memory fallback if Redis is unavailable.

④

Certificate cache lookup

Key: oaai:cert:{agent_id}:{gate_domain}. Cache hit → SPEED_PASS, riddle is never selected, grading never runs.

sub-millisecond hit

⑤

Riddle selection

Resolution: bound_riddle_id → asset_id → riddle_id Redis cache (TTL 60s) → in-memory RiddleStore.select_for_challenge(domain, difficulty). No SQL on the repeat path.

⑥

Auto-format the agent's answer

AutoFormatter normalises raw output: strips markdown code fences, tries JSON, falls back to line-by-line key: value extraction. Returns a clean dict.

⑦

Grade each field

GateHandshake.evaluate() runs the OutputValidator across every expected_output using its declared match_type (exact / contains / regex). Score = correct ÷ total. Verdict = PASS only when score == 1.0.

⑧

Mint token + issue certificate

On PASS: signed GateToken (JWT, TTL 5 min) for scope access:<domain> + CapabilityCertificate cached in Redis for the next 1 hour.

enables Speed Pass

⑨

Emit firewall event

One stable event_id per real request → in-memory subscribers (SSE) and Postgres firewall_events table. queue.put_nowait() never blocks the proxy.

⑩

Stream upstream byte-for-byte

httpx.AsyncClient.stream("POST", upstream, …) + async for chunk in resp.aiter_raw(): yield chunk. nginx is configured with proxy_buffering off and Cloud Run with --timeout 900s for long generations.

zero buffering SSE-native

05 · Grading Model

Deterministic, per-field, no LLM-in-the-loop.

Grading is pure CPU. There is no AI judging another AI — that would be slow, expensive, and non-deterministic. Instead each riddle ships with an answer key and three match strategies.

🟰

exact

String-normalised equality. str(submitted).strip() == str(expected).strip(). Perfect for IDs, project names, version numbers.

🔍

contains

Substring check: expected in submitted. Use for bucket names, partial paths, or "must mention X".

⚙️

regex

Pattern match: re.search(pattern, submitted). Use for IPs, semver ranges, free-form constraints.

example_riddle.json — expected output spec

// Riddle: GCP project config extraction
{
  "gate_domain": "cloud_infrastructure",
  "difficulty": "standard",
  "prompt": "PROJECT_ID=acme-prod\nREGION=us-central1\nBUCKET=acme-prod-data",
  "expected_outputs": [
    { "field_name": "project_id", "expected_value": "acme-prod",    "match_type": "exact"    },
    { "field_name": "region",     "expected_value": "us-central1",  "match_type": "exact"    },
    { "field_name": "bucket",     "expected_value": "acme-prod",    "match_type": "contains" }
  ]
}

A score of less than 1.0 is a failure. Partial credit is recorded for training feedback but the gate does not open. The 403 response carries a feedback object with one entry per field — so the calling agent can self-correct deterministically.

06 · Speed Pass

How we keep firewalled traffic fast.

The first time an agent passes a riddle for a domain, a Capability Certificate is issued and cached. Every subsequent request for the same (agent_id, domain) pair is allowed through with a single Redis GET — no riddle pulled, no grading run.

stateDiagram-v2
    [*] --> Pending : Agent first request
    Pending --> Challenged : Riddle selected
    Challenged --> Active : Score == 1.0 / cert issued
    Challenged --> Pending : Score < 1.0 / 403 + feedback
    Active --> SPEED_PASS : Subsequent request hits cache
    Active --> Expired : TTL elapsed (default 1h)
    Active --> Revoked : Admin revokes / riddle edited
    Expired --> Challenged : Re-challenge on next call
    Revoked --> Challenged : Re-challenge on next call
    SPEED_PASS --> Active : Cache still warm

📦

Redis-backed

Production uses RedisTokenManager — certs are shared across every Cloud Run replica. No cold-cert penalty on autoscale.

⏱️

Bounded TTL

Default 1-hour TTL keeps the privilege window small. Adjust per-org or per-domain if your risk model requires shorter intervals.

🔥

Auto-revocation

Edit a riddle → every cert that solved it is revoked. Disable an API key → the next request is denied before grading.

07 · Threat Coverage

What the gate stops — and how.

The Riddle Firewall is a contract enforcer. It does not try to out-think a malicious prompt — it requires the AI to prove it can extract the right values from a known input.

Threat

How it shows up

How we stop it

Prompt injection

Hostile text smuggled into context tries to make the AI ignore guardrails.

AutoFormatter strips everything except the answer dict. Any extra narrative is discarded before grading.

Capability drift

A model that used to extract project_id correctly suddenly hallucinates one.

The riddle is re-graded on every cert expiration. Drift is caught on the next refresh, not at the customer.

Compromised agent

Stolen oaai-sk-… key being used by an unauthorised process.

PATCH /v1/keys/<id> { "disabled": true } — instant, global, no deploy. proxy.blocked event for forensics.

Provider hop

Agent bound to OpenAI tries to call Anthropic to bypass a policy.

Provider lock on the API key → mismatched model → 403 provider_mismatch.

Runaway cost

An agent loops, billing thousands of upstream tokens.

Rate limits (Redis ZSET) + per-org credit ledger. Blocked requests cost zero upstream tokens.

Silent failure

Bad AI output reaches production unnoticed.

Every decision is persisted to firewall_events with the riddle, score, feedback, and elapsed time.

Stale privilege

Riddle is tightened but agents keep using old certs.

Riddle update auto-revokes every cert that solved the old version (Redis SCAN + in-memory sweep).

08 · Performance & Scale

Designed for LLM-request volume.

⚡

p50 overhead

< 1 ms on the speed-pass path. 5–50 ms on the cold-grade path. Streaming response start time is dominated by the upstream LLM, not by us.

🔄

Concurrency

Async coroutines on every request. Cloud Run autoscales horizontally; Redis-shared cert cache means a SPEED_PASS earned on one replica is honoured by every other.

🚦

Rate limiting

Sliding-window ZSET in Redis. Global 60 req/min + per-agent 30 req/min by default; both tunable per-org.

📡

Streaming

httpx.aiter_raw() + StreamingResponse. nginx proxy_buffering off, Cloud Run --timeout 900s. Long generations stream uninterrupted.

🔌

No DB on hot path

Auth binding snapshotted by the auth dependency. The only residual SQL is BYOK provider-key lookup — one short-lived session, closed before the upstream stream starts.

🎯

Anthropic normalisation

Anthropic Messages SSE is rewritten on-the-wire as OpenAI chat.completion.chunk SSE, so existing OpenAI/LiteLLM SDKs work unchanged through the firewall.

09 · Compliance & Audit

Every decision is queryable.

Every SPEED_PASS, PASSED, DENIED, and proxy.blocked event is persisted to a Postgres table. Your SOC 2 / ISO 27001 / internal audit team gets the same view as your operators.

firewall_events table — migration 010

CREATE TABLE firewall_events (
    event_id     UUID      PRIMARY KEY,
    org_id       UUID      NOT NULL REFERENCES organizations,
    event_type   VARCHAR   -- riddle.passed / proxy.blocked / firewall.allow ...
    agent_id     VARCHAR,
    api_key_id   UUID,
    gate_domain  VARCHAR,
    riddle_id    UUID,
    outcome      VARCHAR   -- passed / failed / speed_pass / no_riddle / blocked
    model        VARCHAR,  -- gpt-4o, claude-3-5-sonnet, ...
    provider     VARCHAR,  -- openai / anthropic / groq / google / xai
    elapsed_ms   INTEGER,
    payload      JSONB,    -- full enriched event (agent, dept, asset, feedback)
    created_at   TIMESTAMPTZ DEFAULT NOW()
);

👤

User Accountability

Every human action (create / edit / delete riddle, toggle API key, change settings) is logged to user_audit_log with IP address and target.

🤖

AI Accountability

Every AI attempt is logged to attempts with submitted_outputs, score, and feedback. Per-agent and per-riddle indexes for fast queries.

🔐

Secrets handling

BYOK provider keys are AES-encrypted at rest (core/key_crypto.py). Inspector ring buffer redacts Authorization / api_key patterns before storage.

10 · Integration

One line of code.

Change the base URL of your OpenAI / Anthropic client to point at api.onlyallow.ai — that's it. Your existing SDKs, retry logic, streaming code paths, and observability all continue to work.

integration.py — drop-in replacement

from openai import OpenAI

# BEFORE — going direct to OpenAI
# client = OpenAI(api_key="sk-...")

# AFTER — same SDK, firewalled traffic
client = OpenAI(
    api_key="oaai-sk-...",                              # your OnlyAllowAI key
    base_url="https://api.onlyallow.ai/v1",            # <— one line changed
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "…"}],
    stream=True,
    extra_body={"oaai_answer": {"project_id": "acme-prod"}},  # riddle answer
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Same API surface, three transparent guarantees: (1) provider lock on the key forces requests to the correct upstream; (2) Speed Pass keeps repeat-request overhead negligible; (3) every call is recorded with a stable event_id for audit.

10b · Developer Reference

Everything an engineer needs to integrate.

Endpoints, authentication, error codes, streaming, SSE event payloads, rate limits, BYOK setup, and SDK snippets across four languages. Bookmark this section.

Base URLs & Authentication

🌐

Production

https://api.onlyallow.ai — same-origin reverse proxy through nginx to Cloud Run. Use this for SDK base_url and all production traffic.

🧪

Direct Cloud Run

https://onlyallow-api-47084672302.us-central1.run.app — useful for staging tests when bypassing nginx. Skips the SSE-aware buffer-off proxy block.

🔑

Auth header

Authorization: Bearer oaai-sk-… on every request. JWTs are used internally for the gate-handshake flow; you don't see them.

Core Endpoints

Method

Path

Purpose

POST

/v1/chat/completions

OpenAI-compatible chat. Streams when stream: true. Anthropic / Groq / Google / xAI / Ollama all transparently dispatched by model name. Carries the riddle answer in extra_body.oaai_answer.

POST

/v1/messages

Anthropic-native messages endpoint. SSE rewritten on-the-wire to OpenAI chunks if your client expects them.

GET

/v1/events/stream

Server-Sent Events firehose — one event per firewall decision for the authenticated org. Used by the Looking Glass dashboard.

GET

/v1/events/stats

Aggregate counters (24h window): total, forwarded, blocked, speed_pass, pass_rate, by_provider, by_department, by_outcome.

GET

/v1/keys/oaai/

List OAAI keys for the current org with bindings.

POST

/v1/keys/oaai/

Issue new OAAI key. Body: { name, department_id?, asset_id?, riddle_id?, provider? }.

PATCH

/v1/keys/<id>

Update bindings or set disabled: true. Disabling takes effect on the next request — globally, no deploy.

POST

/v1/riddles

Create or update a riddle. Bumps version and auto-revokes prior certs.

GET

/health · /auth/health

Liveness probes — no auth, no DB, no Redis. Return {"status":"ok"}. Safe for Kubernetes / Cloud Run liveness checks.

Error Codes

HTTP

error field

Meaning & remediation

401

invalid_api_key

Missing or malformed Authorization header. Check the prefix is oaai-sk-.

403

agent_disabled

The OAAI key is disabled. Re-enable via PATCH /v1/keys/<id> { "disabled": false }.

403

riddle_failed

Field-level feedback object in the response body shows the mismatch per expected_output.

403

provider_mismatch

Model name resolves to a different provider than the key's provider_bound. Either remove the lock or use a matching model.

403

require_riddle

Org has deny-by-default enabled and the asset has no riddle attached. Attach one or relax the policy.

429

rate_limited

Sliding-window limit hit. Response includes retry_after_ms. Default 60 req/min/org & 30 req/min/agent.

502

upstream_error

Upstream LLM returned a transport failure. Original status & body included in upstream.

503

quota_exhausted

Org credit ledger balance ≤ 0. Refill via /v1/billing/topup.

SSE Event Payload (firehose)

GET /v1/events/stream — SSE chunk

event: firewall.allow
data: {
  "event_id":        "emo-1063",
  "org_id":          "a1b2c3…",
  "user_id":         "u-456",
  "user_email":      "ops@acme.com",
  "agent_id":        "agent-llama-70b",
  "api_key_id":      "k-789",
  "api_key_name":    "prod-llama-key",
  "api_key_prefix":  "oaai-sk-aBc1",
  "domain":          "analytics",
  "provider":        "ollama",
  "provider_bound":  "ollama",
  "model":           "llama-3.1-70b",
  "department":      "dept-001",
  "department_name": "Analytics",
  "asset":           "asset-022",
  "asset_name":      "looker-dashboards",
  "riddle_id":       "r-555",
  "module_type":     "human",         // human-bound vs ai-assigned
  "outcome":         "speed_pass",    // passed / failed / speed_pass / no_riddle / blocked
  "elapsed_ms":      1,
  "created_at":      "2026-05-17T12:39:48Z"
}

Event types emitted: riddle.passed, riddle.failed, proxy.forward, proxy.blocked, firewall.allow, firewall.deny. Every event lands both on the SSE bus and in the firewall_events table.

SDK Snippets

cURL — one-shot chat completion

curl -N https://api.onlyallow.ai/v1/chat/completions \
  -H "Authorization: Bearer oaai-sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{"role":"user","content":"Summarise the project."}],
    "oaai_answer": {"project_id":"acme-prod","region":"us-central1"}
  }'

Node.js — OpenAI SDK + streaming

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OAAI_KEY,
  baseURL: "https://api.onlyallow.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-3-5-sonnet",
  stream: true,
  messages: [{ role: "user", content: "…" }],
  oaai_answer: { project_id: "acme-prod" },          // riddle answer
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Python — consuming the SSE firehose

import httpx, json

async with httpx.AsyncClient(timeout=None) as c:
    async with c.stream(
        "GET",
        "https://api.onlyallow.ai/v1/events/stream",
        headers={"Authorization": f"Bearer {OAAI_KEY}"},
    ) as r:
        async for line in r.aiter_lines():
            if line.startswith("data: "):
                evt = json.loads(line[6:])
                if evt["outcome"] == "blocked":
                    alert_siem(evt)

Go — admin: disable a compromised key (kill-switch)

package main

import ("bytes"; "net/http")

func killKey(adminToken, keyID string) error {
    body := []byte(`{"disabled": true}`)
    req, _ := http.NewRequest(
        "PATCH",
        "https://api.onlyallow.ai/v1/keys/"+keyID,
        bytes.NewReader(body),
    )
    req.Header.Set("Authorization", "Bearer "+adminToken)
    req.Header.Set("Content-Type", "application/json")
    _, err := http.DefaultClient.Do(req)
    return err  // effective on the very next request to that key, globally
}

BYOK Provider Setup

🔐

Where keys live

provider_keys table, AES-encrypted at rest via core/key_crypto.py. Decryption is just-in-time for the upstream request and never logged.

⚙️

Supported providers

openai · anthropic · groq · google · xai · ollama. Detected by model name; lock per-key with provider.

📤

How to register

POST /v1/keys/byok with { provider, api_key, label? }. The key is encrypted before the SQL INSERT — plain bytes never touch the DB.

Self-Hosting Quickstart

Local dev — docker-compose

# 1. Clone
git clone https://github.com/onlyallowai/onlyallowai.git
cd onlyallowai

# 2. Boot Postgres + Redis + API
docker compose up -d

# 3. Apply migrations
docker compose exec api alembic upgrade head

# 4. Health check
curl http://localhost:8000/health
# → {"status":"ok"}

# 5. Issue your first OAAI key (admin token from .env)
curl -X POST http://localhost:8000/v1/keys/oaai/ \
  -H "Authorization: Bearer $ADMIN" \
  -d '{"name":"prod-key","provider":"openai"}'

Reference deployment: Cloud Run + Cloud SQL + Memorystore Redis on GCP, fronted by nginx with proxy_buffering off for SSE. Terraform IaC ships in infra/terraform/ (7 modules) — see the deployment guide for the production wiring.

Rate Limits & Limits That Matter

Limit

Default

How to change

Per-org rate

60 req / minute

Tunable per-org via organizations.rate_limit_per_min.

Per-agent rate

30 req / minute

Sliding-window ZSET in Redis, in-memory fallback when Redis is down.

Speed-Pass cert TTL

3600 s (1 hour)

Env var OAAI_CERT_TTL_SECONDS.

Gate-token JWT TTL

300 s (5 min)

Env var OAAI_GATE_TOKEN_TTL.

Upstream timeout

900 s

Cloud Run --timeout + nginx proxy_read_timeout.

SSE ring buffer

1000 events

In-process; Postgres firewall_events is the durable record.

Max payload

1 MiB request body

nginx client_max_body_size. Streaming responses are unbounded.

Local Testing Cheatsheet

🧪

Unit + integration

python -m pytest tests/ -q --ignore=tests/test_v2 -m "not db" — same gate deploy.ps1 runs.

🔬

Full DB suite

pytest tests/ -v with Cloud SQL reachable (or Docker Compose Postgres). 209 tests, ~6s.

📈

Coverage

pytest --cov=api --cov=gate_layer --cov=riddle_matrix --cov-report=html. Open htmlcov/index.html.

Versioning & Backwards Compatibility

The public API is prefixed /v1. Breaking changes go to /v2 — the two versions run side-by-side for at least one quarter.
SSE payload fields are additive only. Never remove a field from /v1/events/stream; clients can assume keys remain stable.
Database migrations are forward-compatible: new columns are nullable or carry a default, and the API tolerates their absence on the old schema for one release cycle.
The OAAI key prefix oaai-sk- is stable. Do not parse the suffix; treat the whole string as opaque.

11 · FAQ

What technical teams ask first.

Does the firewall add latency to streaming responses?

On the speed-pass path: under 1 ms before the first upstream byte is requested. On the cold-grade path: 5–50 ms (pure CPU). Once the gate passes, response chunks stream byte-for-byte via httpx.aiter_raw() with no buffering — we cannot slow down the upstream stream because we don't decode it.

How do you avoid being a single point of failure?

Cloud Run autoscales horizontally — multiple replicas, no sticky sessions.
Redis cert cache is shared across replicas; if Redis is unavailable, the gate falls back to in-process caching (logs warning).
Rate limiter has the same in-memory fallback.
Postgres is regional with point-in-time recovery; the hot path holds no DB connections during grading.
Health endpoints (/health, /auth/health) report no false positives — they don't touch DB / Redis.

What happens if you call an LLM I don't have a riddle for?

The request returns NO_RIDDLE and is forwarded unmodified — with the asset showing as EXPOSED in the dashboard. The default is allow for backwards compatibility; you opt-in to enforcement by attaching a riddle. If your policy requires deny-by-default, flip the org-level require_riddle flag.

How are riddles stored and edited?

Riddles live in Postgres (riddles table) and are mirrored into an in-memory RiddleStore at boot and on every CRUD operation. Edits are picked up by the firewall on the next request — no redeploy required. The version field is auto-bumped on update, which auto-revokes every certificate that was earned by solving the previous version.

What happens to in-flight requests if I revoke a cert?

In-flight streams complete normally — we never interrupt the byte channel. The next request from that agent for that domain pays the full grading cost. This preserves the no-buffering guarantee while still giving operators a kill switch. If you need to terminate an in-flight stream immediately, use PATCH /v1/keys/<id> { "disabled": true } — but understand it blocks new requests, not bytes already mid-flight.

How do you handle BYOK (bring-your-own-key) providers?

Each org can register provider keys for OpenAI / Anthropic / Groq / Google / xAI / Ollama. Keys are AES-encrypted at rest using core/key_crypto.py. The auth-bound OAAI key can be locked to one provider — mismatched model requests are denied with 403 provider_mismatch.

Is the dashboard required for the firewall to work?

No. The dashboard reads a separate event bus and can never stall a request. Close every browser tab and the gate keeps working, riddles still enforce, Speed Pass still fires, audit rows still land in Postgres.

Can I self-host?

Yes. Terraform IaC ships in infra/terraform/ (7 modules). Cloud Run + Cloud SQL + Memorystore Redis on GCP is the reference deployment. Docker Compose ships for local dev. All state lives in Postgres + Redis — no proprietary backing store.

How do I get the full technical user guide?

Read the in-depth Markdown document RiddleUserguide.md — it covers riddle anatomy, the four outcomes, evaluation pipeline, grading model, speed-pass mechanics, admin controls, and a full worked example with request/response samples.

Every AI request, tested before it ships.

What it is, in one paragraph.

Why it matters to the business

Contents

Inline gate, decoupled observation.

Inline, in-process

Zero buffering

Out-of-band telemetry

Every request resolves to one of four outcomes.

Cached competence

No rule attached

Riddle solved

Riddle failed

From bearer token to upstream byte — every step.

Resolve agent + domain

Disabled-key short circuit

Rate limit check

Certificate cache lookup

Riddle selection

Auto-format the agent's answer

Grade each field

Mint token + issue certificate

Emit firewall event

Stream upstream byte-for-byte

Deterministic, per-field, no LLM-in-the-loop.

exact

contains

regex

How we keep firewalled traffic fast.

Redis-backed

Bounded TTL

Auto-revocation

What the gate stops — and how.

Designed for LLM-request volume.

p50 overhead

Concurrency

Rate limiting

Streaming

No DB on hot path

Anthropic normalisation

Every decision is queryable.

User Accountability

AI Accountability

Secrets handling

One line of code.

Everything an engineer needs to integrate.

Base URLs & Authentication

Production

Direct Cloud Run

Auth header

Core Endpoints

Error Codes

SSE Event Payload (firehose)

SDK Snippets

BYOK Provider Setup

Where keys live

Supported providers

How to register

Self-Hosting Quickstart

Rate Limits & Limits That Matter

Local Testing Cheatsheet

Unit + integration

Full DB suite

Coverage

Versioning & Backwards Compatibility

What technical teams ask first.

Ready to firewall your AI?