
Brighteye: LLM Proxy
In the rapidly evolving world of AI, large language models (LLMs) have become integral to everything from chatbots and code completion to automated content generation. But with great power comes great complexity: how do you ensure every request to these models aligns with your organization’s policies, cost constraints, and security requirements? Enter Brighteye, a lightweight, pluggable proxy layer that provides a single plane for managing access, quotas, filters, and observability across any LLM provider.
Background: The LLM Explosion
Over the past two years, LLMs like OpenAI’s GPT series and Anthropic’s Claude have delivered unprecedented natural language capabilities to applications. Teams adopt them for quick prototyping, production chatbots, and even internal knowledge assistants. However, uncontrolled consumption quickly leads to runaway costs, token-draining test scripts, and scattered security blind spots.
Key challenges include:
- Decentralized Access Controls: Hard-coded API keys in individual applications make rotation and revocation a nightmare.
- No Unified Quota Management: Different projects compete for the same credits, often leading to service disruptions when budgets are exhausted.
- Fragmented Observability: Without consolidated metrics, spotting sudden spikes or filtering issues requires complex logging pipelines.
- Prompt Injection Risks: Sensitive information or malicious inputs might slip through unfiltered, exposing data or triggering harmful operations upstream.
Brighteye was born to address these gaps with a single, YAML-driven configuration and zero changes to existing application code.
Deep Dive: How Brighteye Works
At its core, Brighteye sits as a reverse proxy for all LLM API calls. Here’s the architecture flow:
-
Initialization: On startup, Brighteye loads
brighteye.yaml
, defining entry points, provider endpoints, default policies, and key groups. -
Request Handling: Incoming requests on port 6789 are inspected for:
- Provider and Model: Extracted from query parameters or JSON body.
- API Key Validation: Checked against defined key groups.
- Prompt Filtering: Blocklists and regex patterns applied before forwarding.
-
Policy Enforcement:
- Rate Limiting: Requests-per-minute and burst controls enforced per key or group.
- Quota Checks: Daily token usage tallied; requests denied if over quota.
-
Upstream Proxy: Only requests that pass all checks are forwarded to the target provider’s base URL via HTTP client.
-
Metrics & Logging:
- Prometheus Counters track requests, blocked prompts, tokens, errors, and more on
http://localhost:6789/metrics
. - Structured JSON Logs capture metadata without leaking secrets.
- Prometheus Counters track requests, blocked prompts, tokens, errors, and more on
Configuration Example
version: "1.0.0"
entryPoints:
web:
address: 6789
metrics:
address: 6790
defaultPolicy:
rateLimit:
rpm: 100
burst: 10
tokenQuota:
daily: 10000
promptFilter:
blocklist: ["harmful content", "inappropriate request"]
regex: ["(?i)password", "(?i)secret"]
maxPromptLength: 4000
providers:
default: openai
services:
- name: openai
base_url: https://api.openai.com/v1
models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
allowedPaths: ["/chat/completions", "/completions"]
- name: anthropic
base_url: https://api.anthropic.com/v1
models: ["claude-3-haiku", "claude-3-sonnet", "claude-3-opus"]
allowedPaths: ["/messages"]
keyGroups:
- name: development
apiKeys:
- key: YOUR-DEV-OPENAI-KEY
label: Development Team
- key: YOUR-DEV-CLAUDE-KEY
label: Testing Environment
permissions:
- provider: openai
models: ["gpt-3.5-turbo", "gpt-4"]
- provider: anthropic
models: ["claude-3-haiku"]
rateLimit: { rpm: 50, burst: 5 }
tokenQuota: { daily: 5000 }
- name: production
apiKeys:
- key: YOUR-PROD-OPENAI-KEY
label: Production OPENAI KEY
- key: YOUR-PROD-CLAUDE-KEY
label: Production CLAUDE KEY
permissions:
- provider: openai
models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
- provider: anthropic
models: ["claude-3-haiku", "claude-3-sonnet", "claude-3-opus"]
rateLimit: { rpm: 200, burst: 20 }
tokenQuota: { daily: 20000 }
Observability & Metrics
Brighteye ships with a rich set of Prometheus metrics on port 6380:
brighteye_requests_total{provider,model}
brighteye_blocked_prompt_total
brighteye_token_usage_total{api_key,model,group}
brighteye_upstream_errors_total{provider}
brighteye_response_time_seconds_bucket{provider,model}
brighteye_rate_limit_hits_total{api_key}
brighteye_quota_usage{api_key}
brighteye_active_connections
brighteye_request_size_bytes_bucket{provider}
Getting Started
Docker Quick-Start
docker pull ghcr.io/mehmetymw/brighteye:latest
docker run -d \
--name brighteye \
-p 6789:6789 \
-v $(pwd)/brighteye.yaml:/app/brighteye.yaml \
ghcr.io/mehmetymw/brighteye:latest
Sample Request
curl -X POST http://localhost:6789/chat/completions?provider=openai \
-H 'Authorization: Bearer YOUR-KEY' \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello!"}]}'
Use Cases
- Dev & Test Environments: Isolate non-prod workloads with limited quotas.
- Production Services: Guarantee SLAs with higher quotas and strict rate limits.
- Security Audits: Enforce prompt filters to prevent accidental leaks.
- Cost Monitoring: Centralize token usage metrics and alert on overages.
Why Brighteye?
- No Code Changes: Works as a drop-in proxy.
- Policy as Code: All rules in a single YAML file.
- Scalable & Extensible: Add providers or tweak policies on the fly.
Brighteye brings order, security, and clarity to your LLM-driven apps. Try it today and let every request pass your eye first!