Technical Brief · For CIOs, CISOs & Platform Teams

Every AI request, tested before it ships.

OnlyAllowAI sits inline between your AI agents and your LLM providers. Each request must pass a structured competency test — a Riddle — before bytes ever reach the model. This document explains the architecture, latency model, threat coverage, and audit surface in the level of detail your security and platform teams need.

Jump to Architecture → 📄 Full User Guide (Markdown)
Sub-millisecond repeat-request overhead Zero buffering on streaming responses 100% audit trail in Postgres
< 1 ms
Speed Pass Overhead
5–50 ms
First-time Grading
0 ms
Stream Buffering
100%
Decisions Audited
01 · Executive Summary

What it is, in one paragraph.

A drop-in firewall for AI traffic. Replace your existing LLM endpoint URL with an OnlyAllowAI URL — nothing else changes in your application code. Behind the scenes, every request is tested against a competency contract you defined; passing requests stream through with negligible latency, failing ones are blocked with structured per-field feedback.

Why it matters to the business

1. You stop bad AI calls before they cost you money. Every upstream request to OpenAI / Anthropic / Groq / Google is billed by token. OnlyAllowAI denies the request before the upstream call is opened — you don't pay for blocked traffic.

2. You meet compliance without re-architecting. Every decision is persisted to Postgres with a stable event ID, the API key used, the riddle attempted, and the model targeted. SOC 2, ISO 27001, and internal audit teams can query the firewall_events table directly.

3. You keep one kill-switch for every AI agent. A single PATCH /v1/keys/<id> { "disabled": true } stops all traffic from that agent — instantly, across every Cloud Run replica, with no deploy.

Contents

  1. Executive Summary
  2. Architecture
  3. The Four Outcomes
  4. Request Lifecycle
  5. Grading Model
  6. Speed Pass & Cert Cache
  7. Threat Coverage
  8. Performance & Scale
  9. Compliance & Audit
  10. Integration
  11. Developer Reference — endpoints, errors, SDKs, SSE, BYOK
  12. FAQ
02 · Architecture

Inline gate, decoupled observation.

The hot path runs the gate decision and the upstream stream on a single coroutine — no queues, no thread hops. The dashboard reads a separate event bus and can never stall a customer request.

AI Agent oaai-sk-… key ONLYALLOWAI PROXY · HOT PATH ① Auth Binding snapshot · 0 DB hits ② Cert Cache Redis · < 1 ms ③ Riddle Store in-memory · 0 DB hits ④ AutoFormatter + Grader field-by-field validation · 5–50 ms ⑤ Decision + Mint Cert PASS · DENY · SPEED_PASS ⑥ Stream Forward (httpx.aiter_raw) byte-for-byte · zero buffering · nginx proxy_buffering off Upstream LLM OpenAI · Anthropic · Groq · Google · xAI 403 + Feedback DENIED · upstream never hit ⑦ EventBus → SSE (Looking Glass Dashboard) non-blocking · put_nowait · drop-oldest on overflow
🛰️

Inline, in-process

The gate decision runs on the same async coroutine that opens the upstream connection. No IPC, no queue, no second hop.

Zero buffering

Once the gate passes, the response is streamed byte-for-byte via httpx.aiter_raw() + FastAPI StreamingResponse.

🪟

Out-of-band telemetry

The Looking Glass dashboard reads a separate EventBus. Pull every browser tab — the proxy doesn't notice.

03 · Decision Model

Every request resolves to one of four outcomes.

Three of these allow the request through with zero buffering. The fourth blocks the request before the upstream LLM is contacted — so denied traffic costs you nothing in upstream tokens.

SPEED_PASS
Cached competence

The agent already cleared this domain; cert in cache.

< 1 ms · forward
NO_RIDDLE
No rule attached

The asset has no riddle defined — allow by default.

< 1 ms · forward (EXPOSED)
PASSED
Riddle solved

100% of expected fields correct; cert issued (TTL 1h).

5–50 ms · forward
DENIED
Riddle failed

Any field wrong → 403 with per-field feedback. Upstream never called.

5–50 ms · blocked
04 · Request Lifecycle

From bearer token to upstream byte — every step.

Every numbered step below runs on the same coroutine. Between step ⑤ (decision) and step ⑥ (forward) the proxy holds no lock, no database connection, and no Vault handle.

Resolve agent + domain

Auth dependency snapshots the API-key binding tuple (org_id, department_id, asset_id, riddle_id, provider, disabled) onto request.state.api_key_binding.

no DB hit on hot path
Disabled-key short circuit

If api_keys.disabled = true, immediately return 403 agent_disabled and emit proxy.blocked. No riddle is even pulled.

Rate limit check

Sliding-window ZSET in Redis: global 60 req/min + per-agent 30 req/min. In-memory fallback if Redis is unavailable.

Certificate cache lookup

Key: oaai:cert:{agent_id}:{gate_domain}. Cache hit → SPEED_PASS, riddle is never selected, grading never runs.

sub-millisecond hit
Riddle selection

Resolution: bound_riddle_idasset_id → riddle_id Redis cache (TTL 60s) → in-memory RiddleStore.select_for_challenge(domain, difficulty). No SQL on the repeat path.

Auto-format the agent's answer

AutoFormatter normalises raw output: strips markdown code fences, tries JSON, falls back to line-by-line key: value extraction. Returns a clean dict.

Grade each field

GateHandshake.evaluate() runs the OutputValidator across every expected_output using its declared match_type (exact / contains / regex). Score = correct ÷ total. Verdict = PASS only when score == 1.0.

Mint token + issue certificate

On PASS: signed GateToken (JWT, TTL 5 min) for scope access:<domain> + CapabilityCertificate cached in Redis for the next 1 hour.

enables Speed Pass
Emit firewall event

One stable event_id per real request → in-memory subscribers (SSE) and Postgres firewall_events table. queue.put_nowait() never blocks the proxy.

Stream upstream byte-for-byte

httpx.AsyncClient.stream("POST", upstream, …) + async for chunk in resp.aiter_raw(): yield chunk. nginx is configured with proxy_buffering off and Cloud Run with --timeout 900s for long generations.

zero buffering SSE-native
05 · Grading Model

Deterministic, per-field, no LLM-in-the-loop.

Grading is pure CPU. There is no AI judging another AI — that would be slow, expensive, and non-deterministic. Instead each riddle ships with an answer key and three match strategies.

🟰

exact

String-normalised equality. str(submitted).strip() == str(expected).strip(). Perfect for IDs, project names, version numbers.

🔍

contains

Substring check: expected in submitted. Use for bucket names, partial paths, or "must mention X".

⚙️

regex

Pattern match: re.search(pattern, submitted). Use for IPs, semver ranges, free-form constraints.

example_riddle.json — expected output spec
// Riddle: GCP project config extraction
{
  "gate_domain": "cloud_infrastructure",
  "difficulty": "standard",
  "prompt": "PROJECT_ID=acme-prod\nREGION=us-central1\nBUCKET=acme-prod-data",
  "expected_outputs": [
    { "field_name": "project_id", "expected_value": "acme-prod",    "match_type": "exact"    },
    { "field_name": "region",     "expected_value": "us-central1",  "match_type": "exact"    },
    { "field_name": "bucket",     "expected_value": "acme-prod",    "match_type": "contains" }
  ]
}

A score of less than 1.0 is a failure. Partial credit is recorded for training feedback but the gate does not open. The 403 response carries a feedback object with one entry per field — so the calling agent can self-correct deterministically.

06 · Speed Pass

How we keep firewalled traffic fast.

The first time an agent passes a riddle for a domain, a Capability Certificate is issued and cached. Every subsequent request for the same (agent_id, domain) pair is allowed through with a single Redis GET — no riddle pulled, no grading run.

stateDiagram-v2
    [*] --> Pending : Agent first request
    Pending --> Challenged : Riddle selected
    Challenged --> Active : Score == 1.0 / cert issued
    Challenged --> Pending : Score < 1.0 / 403 + feedback
    Active --> SPEED_PASS : Subsequent request hits cache
    Active --> Expired : TTL elapsed (default 1h)
    Active --> Revoked : Admin revokes / riddle edited
    Expired --> Challenged : Re-challenge on next call
    Revoked --> Challenged : Re-challenge on next call
    SPEED_PASS --> Active : Cache still warm
      
📦

Redis-backed

Production uses RedisTokenManager — certs are shared across every Cloud Run replica. No cold-cert penalty on autoscale.

⏱️

Bounded TTL

Default 1-hour TTL keeps the privilege window small. Adjust per-org or per-domain if your risk model requires shorter intervals.

🔥

Auto-revocation

Edit a riddle → every cert that solved it is revoked. Disable an API key → the next request is denied before grading.

07 · Threat Coverage

What the gate stops — and how.

The Riddle Firewall is a contract enforcer. It does not try to out-think a malicious prompt — it requires the AI to prove it can extract the right values from a known input.

Threat
How it shows up
How we stop it
Prompt injection
Hostile text smuggled into context tries to make the AI ignore guardrails.
AutoFormatter strips everything except the answer dict. Any extra narrative is discarded before grading.
Capability drift
A model that used to extract project_id correctly suddenly hallucinates one.
The riddle is re-graded on every cert expiration. Drift is caught on the next refresh, not at the customer.
Compromised agent
Stolen oaai-sk-… key being used by an unauthorised process.
PATCH /v1/keys/<id> { "disabled": true } — instant, global, no deploy. proxy.blocked event for forensics.
Provider hop
Agent bound to OpenAI tries to call Anthropic to bypass a policy.
Provider lock on the API key → mismatched model → 403 provider_mismatch.
Runaway cost
An agent loops, billing thousands of upstream tokens.
Rate limits (Redis ZSET) + per-org credit ledger. Blocked requests cost zero upstream tokens.
Silent failure
Bad AI output reaches production unnoticed.
Every decision is persisted to firewall_events with the riddle, score, feedback, and elapsed time.
Stale privilege
Riddle is tightened but agents keep using old certs.
Riddle update auto-revokes every cert that solved the old version (Redis SCAN + in-memory sweep).
08 · Performance & Scale

Designed for LLM-request volume.

p50 overhead

< 1 ms on the speed-pass path. 5–50 ms on the cold-grade path. Streaming response start time is dominated by the upstream LLM, not by us.

🔄

Concurrency

Async coroutines on every request. Cloud Run autoscales horizontally; Redis-shared cert cache means a SPEED_PASS earned on one replica is honoured by every other.

🚦

Rate limiting

Sliding-window ZSET in Redis. Global 60 req/min + per-agent 30 req/min by default; both tunable per-org.

📡

Streaming

httpx.aiter_raw() + StreamingResponse. nginx proxy_buffering off, Cloud Run --timeout 900s. Long generations stream uninterrupted.

🔌

No DB on hot path

Auth binding snapshotted by the auth dependency. The only residual SQL is BYOK provider-key lookup — one short-lived session, closed before the upstream stream starts.

🎯

Anthropic normalisation

Anthropic Messages SSE is rewritten on-the-wire as OpenAI chat.completion.chunk SSE, so existing OpenAI/LiteLLM SDKs work unchanged through the firewall.

09 · Compliance & Audit

Every decision is queryable.

Every SPEED_PASS, PASSED, DENIED, and proxy.blocked event is persisted to a Postgres table. Your SOC 2 / ISO 27001 / internal audit team gets the same view as your operators.

firewall_events table — migration 010
CREATE TABLE firewall_events (
    event_id     UUID      PRIMARY KEY,
    org_id       UUID      NOT NULL REFERENCES organizations,
    event_type   VARCHAR   -- riddle.passed / proxy.blocked / firewall.allow ...
    agent_id     VARCHAR,
    api_key_id   UUID,
    gate_domain  VARCHAR,
    riddle_id    UUID,
    outcome      VARCHAR   -- passed / failed / speed_pass / no_riddle / blocked
    model        VARCHAR,  -- gpt-4o, claude-3-5-sonnet, ...
    provider     VARCHAR,  -- openai / anthropic / groq / google / xai
    elapsed_ms   INTEGER,
    payload      JSONB,    -- full enriched event (agent, dept, asset, feedback)
    created_at   TIMESTAMPTZ DEFAULT NOW()
);
👤

User Accountability

Every human action (create / edit / delete riddle, toggle API key, change settings) is logged to user_audit_log with IP address and target.

🤖

AI Accountability

Every AI attempt is logged to attempts with submitted_outputs, score, and feedback. Per-agent and per-riddle indexes for fast queries.

🔐

Secrets handling

BYOK provider keys are AES-encrypted at rest (core/key_crypto.py). Inspector ring buffer redacts Authorization / api_key patterns before storage.

10 · Integration

One line of code.

Change the base URL of your OpenAI / Anthropic client to point at api.onlyallow.ai — that's it. Your existing SDKs, retry logic, streaming code paths, and observability all continue to work.

integration.py — drop-in replacement
from openai import OpenAI

# BEFORE — going direct to OpenAI
# client = OpenAI(api_key="sk-...")

# AFTER — same SDK, firewalled traffic
client = OpenAI(
    api_key="oaai-sk-...",                              # your OnlyAllowAI key
    base_url="https://api.onlyallow.ai/v1",            # <— one line changed
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "…"}],
    stream=True,
    extra_body={"oaai_answer": {"project_id": "acme-prod"}},  # riddle answer
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Same API surface, three transparent guarantees: (1) provider lock on the key forces requests to the correct upstream; (2) Speed Pass keeps repeat-request overhead negligible; (3) every call is recorded with a stable event_id for audit.

10b · Developer Reference

Everything an engineer needs to integrate.

Endpoints, authentication, error codes, streaming, SSE event payloads, rate limits, BYOK setup, and SDK snippets across four languages. Bookmark this section.

Base URLs & Authentication

🌐

Production

https://api.onlyallow.ai — same-origin reverse proxy through nginx to Cloud Run. Use this for SDK base_url and all production traffic.

🧪

Direct Cloud Run

https://onlyallow-api-47084672302.us-central1.run.app — useful for staging tests when bypassing nginx. Skips the SSE-aware buffer-off proxy block.

🔑

Auth header

Authorization: Bearer oaai-sk-… on every request. JWTs are used internally for the gate-handshake flow; you don't see them.

Core Endpoints

Method
Path
Purpose
POST
/v1/chat/completions
OpenAI-compatible chat. Streams when stream: true. Anthropic / Groq / Google / xAI / Ollama all transparently dispatched by model name. Carries the riddle answer in extra_body.oaai_answer.
POST
/v1/messages
Anthropic-native messages endpoint. SSE rewritten on-the-wire to OpenAI chunks if your client expects them.
GET
/v1/events/stream
Server-Sent Events firehose — one event per firewall decision for the authenticated org. Used by the Looking Glass dashboard.
GET
/v1/events/stats
Aggregate counters (24h window): total, forwarded, blocked, speed_pass, pass_rate, by_provider, by_department, by_outcome.
GET
/v1/keys/oaai/
List OAAI keys for the current org with bindings.
POST
/v1/keys/oaai/
Issue new OAAI key. Body: { name, department_id?, asset_id?, riddle_id?, provider? }.
PATCH
/v1/keys/<id>
Update bindings or set disabled: true. Disabling takes effect on the next request — globally, no deploy.
POST
/v1/riddles
Create or update a riddle. Bumps version and auto-revokes prior certs.
GET
/health · /auth/health
Liveness probes — no auth, no DB, no Redis. Return {"status":"ok"}. Safe for Kubernetes / Cloud Run liveness checks.

Error Codes

HTTP
error field
Meaning & remediation
401
invalid_api_key
Missing or malformed Authorization header. Check the prefix is oaai-sk-.
403
agent_disabled
The OAAI key is disabled. Re-enable via PATCH /v1/keys/<id> { "disabled": false }.
403
riddle_failed
Field-level feedback object in the response body shows the mismatch per expected_output.
403
provider_mismatch
Model name resolves to a different provider than the key's provider_bound. Either remove the lock or use a matching model.
403
require_riddle
Org has deny-by-default enabled and the asset has no riddle attached. Attach one or relax the policy.
429
rate_limited
Sliding-window limit hit. Response includes retry_after_ms. Default 60 req/min/org & 30 req/min/agent.
502
upstream_error
Upstream LLM returned a transport failure. Original status & body included in upstream.
503
quota_exhausted
Org credit ledger balance ≤ 0. Refill via /v1/billing/topup.

SSE Event Payload (firehose)

GET /v1/events/stream — SSE chunk
event: firewall.allow
data: {
  "event_id":        "emo-1063",
  "org_id":          "a1b2c3…",
  "user_id":         "u-456",
  "user_email":      "ops@acme.com",
  "agent_id":        "agent-llama-70b",
  "api_key_id":      "k-789",
  "api_key_name":    "prod-llama-key",
  "api_key_prefix":  "oaai-sk-aBc1",
  "domain":          "analytics",
  "provider":        "ollama",
  "provider_bound":  "ollama",
  "model":           "llama-3.1-70b",
  "department":      "dept-001",
  "department_name": "Analytics",
  "asset":           "asset-022",
  "asset_name":      "looker-dashboards",
  "riddle_id":       "r-555",
  "module_type":     "human",         // human-bound vs ai-assigned
  "outcome":         "speed_pass",    // passed / failed / speed_pass / no_riddle / blocked
  "elapsed_ms":      1,
  "created_at":      "2026-05-17T12:39:48Z"
}

Event types emitted: riddle.passed, riddle.failed, proxy.forward, proxy.blocked, firewall.allow, firewall.deny. Every event lands both on the SSE bus and in the firewall_events table.

SDK Snippets

cURL — one-shot chat completion
curl -N https://api.onlyallow.ai/v1/chat/completions \
  -H "Authorization: Bearer oaai-sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "stream": true,
    "messages": [{"role":"user","content":"Summarise the project."}],
    "oaai_answer": {"project_id":"acme-prod","region":"us-central1"}
  }'
Node.js — OpenAI SDK + streaming
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OAAI_KEY,
  baseURL: "https://api.onlyallow.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-3-5-sonnet",
  stream: true,
  messages: [{ role: "user", content: "…" }],
  oaai_answer: { project_id: "acme-prod" },          // riddle answer
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}
Python — consuming the SSE firehose
import httpx, json

async with httpx.AsyncClient(timeout=None) as c:
    async with c.stream(
        "GET",
        "https://api.onlyallow.ai/v1/events/stream",
        headers={"Authorization": f"Bearer {OAAI_KEY}"},
    ) as r:
        async for line in r.aiter_lines():
            if line.startswith("data: "):
                evt = json.loads(line[6:])
                if evt["outcome"] == "blocked":
                    alert_siem(evt)
Go — admin: disable a compromised key (kill-switch)
package main

import ("bytes"; "net/http")

func killKey(adminToken, keyID string) error {
    body := []byte(`{"disabled": true}`)
    req, _ := http.NewRequest(
        "PATCH",
        "https://api.onlyallow.ai/v1/keys/"+keyID,
        bytes.NewReader(body),
    )
    req.Header.Set("Authorization", "Bearer "+adminToken)
    req.Header.Set("Content-Type", "application/json")
    _, err := http.DefaultClient.Do(req)
    return err  // effective on the very next request to that key, globally
}

BYOK Provider Setup

🔐

Where keys live

provider_keys table, AES-encrypted at rest via core/key_crypto.py. Decryption is just-in-time for the upstream request and never logged.

⚙️

Supported providers

openai · anthropic · groq · google · xai · ollama. Detected by model name; lock per-key with provider.

📤

How to register

POST /v1/keys/byok with { provider, api_key, label? }. The key is encrypted before the SQL INSERT — plain bytes never touch the DB.

Self-Hosting Quickstart

Local dev — docker-compose
# 1. Clone
git clone https://github.com/onlyallowai/onlyallowai.git
cd onlyallowai

# 2. Boot Postgres + Redis + API
docker compose up -d

# 3. Apply migrations
docker compose exec api alembic upgrade head

# 4. Health check
curl http://localhost:8000/health
# → {"status":"ok"}

# 5. Issue your first OAAI key (admin token from .env)
curl -X POST http://localhost:8000/v1/keys/oaai/ \
  -H "Authorization: Bearer $ADMIN" \
  -d '{"name":"prod-key","provider":"openai"}'

Reference deployment: Cloud Run + Cloud SQL + Memorystore Redis on GCP, fronted by nginx with proxy_buffering off for SSE. Terraform IaC ships in infra/terraform/ (7 modules) — see the deployment guide for the production wiring.

Rate Limits & Limits That Matter

Limit
Default
How to change
Per-org rate
60 req / minute
Tunable per-org via organizations.rate_limit_per_min.
Per-agent rate
30 req / minute
Sliding-window ZSET in Redis, in-memory fallback when Redis is down.
Speed-Pass cert TTL
3600 s (1 hour)
Env var OAAI_CERT_TTL_SECONDS.
Gate-token JWT TTL
300 s (5 min)
Env var OAAI_GATE_TOKEN_TTL.
Upstream timeout
900 s
Cloud Run --timeout + nginx proxy_read_timeout.
SSE ring buffer
1000 events
In-process; Postgres firewall_events is the durable record.
Max payload
1 MiB request body
nginx client_max_body_size. Streaming responses are unbounded.

Local Testing Cheatsheet

🧪

Unit + integration

python -m pytest tests/ -q --ignore=tests/test_v2 -m "not db" — same gate deploy.ps1 runs.

🔬

Full DB suite

pytest tests/ -v with Cloud SQL reachable (or Docker Compose Postgres). 209 tests, ~6s.

📈

Coverage

pytest --cov=api --cov=gate_layer --cov=riddle_matrix --cov-report=html. Open htmlcov/index.html.

Versioning & Backwards Compatibility

11 · FAQ

What technical teams ask first.

Does the firewall add latency to streaming responses?

On the speed-pass path: under 1 ms before the first upstream byte is requested. On the cold-grade path: 5–50 ms (pure CPU). Once the gate passes, response chunks stream byte-for-byte via httpx.aiter_raw() with no buffering — we cannot slow down the upstream stream because we don't decode it.

How do you avoid being a single point of failure?
  • Cloud Run autoscales horizontally — multiple replicas, no sticky sessions.
  • Redis cert cache is shared across replicas; if Redis is unavailable, the gate falls back to in-process caching (logs warning).
  • Rate limiter has the same in-memory fallback.
  • Postgres is regional with point-in-time recovery; the hot path holds no DB connections during grading.
  • Health endpoints (/health, /auth/health) report no false positives — they don't touch DB / Redis.
What happens if you call an LLM I don't have a riddle for?

The request returns NO_RIDDLE and is forwarded unmodified — with the asset showing as EXPOSED in the dashboard. The default is allow for backwards compatibility; you opt-in to enforcement by attaching a riddle. If your policy requires deny-by-default, flip the org-level require_riddle flag.

How are riddles stored and edited?

Riddles live in Postgres (riddles table) and are mirrored into an in-memory RiddleStore at boot and on every CRUD operation. Edits are picked up by the firewall on the next request — no redeploy required. The version field is auto-bumped on update, which auto-revokes every certificate that was earned by solving the previous version.

What happens to in-flight requests if I revoke a cert?

In-flight streams complete normally — we never interrupt the byte channel. The next request from that agent for that domain pays the full grading cost. This preserves the no-buffering guarantee while still giving operators a kill switch. If you need to terminate an in-flight stream immediately, use PATCH /v1/keys/<id> { "disabled": true } — but understand it blocks new requests, not bytes already mid-flight.

How do you handle BYOK (bring-your-own-key) providers?

Each org can register provider keys for OpenAI / Anthropic / Groq / Google / xAI / Ollama. Keys are AES-encrypted at rest using core/key_crypto.py. The auth-bound OAAI key can be locked to one provider — mismatched model requests are denied with 403 provider_mismatch.

Is the dashboard required for the firewall to work?

No. The dashboard reads a separate event bus and can never stall a request. Close every browser tab and the gate keeps working, riddles still enforce, Speed Pass still fires, audit rows still land in Postgres.

Can I self-host?

Yes. Terraform IaC ships in infra/terraform/ (7 modules). Cloud Run + Cloud SQL + Memorystore Redis on GCP is the reference deployment. Docker Compose ships for local dev. All state lives in Postgres + Redis — no proprietary backing store.

How do I get the full technical user guide?

Read the in-depth Markdown document RiddleUserguide.md — it covers riddle anatomy, the four outcomes, evaluation pipeline, grading model, speed-pass mechanics, admin controls, and a full worked example with request/response samples.

Ready to firewall your AI?

Spin up an org in 60 seconds, attach your first riddle, point your OpenAI SDK at api.onlyallow.ai — and watch every request light up the Looking Glass.

Get Started → 📄 Full User Guide See the Product Tour