Brighteye: LLM Proxy

In the rapidly evolving world of AI, large language models (LLMs) have become integral to everything from chatbots and code completion to automated content generation. But with great power comes great complexity: how do you ensure every request to these models aligns with your organization’s policies, cost constraints, and security requirements? Enter Brighteye, a lightweight, pluggable proxy layer that provides a single plane for managing access, quotas, filters, and observability across any LLM provider.

Background: The LLM Explosion

Over the past two years, LLMs like OpenAI’s GPT series and Anthropic’s Claude have delivered unprecedented natural language capabilities to applications. Teams adopt them for quick prototyping, production chatbots, and even internal knowledge assistants. However, uncontrolled consumption quickly leads to runaway costs, token-draining test scripts, and scattered security blind spots.

Key challenges include:

Decentralized Access Controls: Hard-coded API keys in individual applications make rotation and revocation a nightmare.
No Unified Quota Management: Different projects compete for the same credits, often leading to service disruptions when budgets are exhausted.
Fragmented Observability: Without consolidated metrics, spotting sudden spikes or filtering issues requires complex logging pipelines.
Prompt Injection Risks: Sensitive information or malicious inputs might slip through unfiltered, exposing data or triggering harmful operations upstream.

Brighteye was born to address these gaps with a single, YAML-driven configuration and zero changes to existing application code.

Deep Dive: How Brighteye Works

At its core, Brighteye sits as a reverse proxy for all LLM API calls. Here’s the architecture flow:

Initialization: On startup, Brighteye loads brighteye.yaml, defining entry points, provider endpoints, default policies, and key groups.
Request Handling: Incoming requests on port 6789 are inspected for:
- Provider and Model: Extracted from query parameters or JSON body.
- API Key Validation: Checked against defined key groups.
- Prompt Filtering: Blocklists and regex patterns applied before forwarding.
Policy Enforcement:
- Rate Limiting: Requests-per-minute and burst controls enforced per key or group.
- Quota Checks: Daily token usage tallied; requests denied if over quota.
Upstream Proxy: Only requests that pass all checks are forwarded to the target provider’s base URL via HTTP client.
Metrics & Logging:
- Prometheus Counters track requests, blocked prompts, tokens, errors, and more on http://localhost:6789/metrics.
- Structured JSON Logs capture metadata without leaking secrets.

Configuration Example

version: "1.0.0"

entryPoints:
  web:
    address: 6789
  metrics:
    address: 6790

defaultPolicy:
  rateLimit:
    rpm: 100
    burst: 10
  tokenQuota:
    daily: 10000
  promptFilter:
    blocklist: ["harmful content", "inappropriate request"]
    regex: ["(?i)password", "(?i)secret"]
  maxPromptLength: 4000

providers:
  default: openai
  services:
    - name: openai
      base_url: https://api.openai.com/v1
      models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
      allowedPaths: ["/chat/completions", "/completions"]
    - name: anthropic
      base_url: https://api.anthropic.com/v1
      models: ["claude-3-haiku", "claude-3-sonnet", "claude-3-opus"]
      allowedPaths: ["/messages"]

keyGroups:
  - name: development
    apiKeys:
      - key: YOUR-DEV-OPENAI-KEY
        label: Development Team
      - key: YOUR-DEV-CLAUDE-KEY
        label: Testing Environment
    permissions:
      - provider: openai
        models: ["gpt-3.5-turbo", "gpt-4"]
      - provider: anthropic
        models: ["claude-3-haiku"]
    rateLimit: { rpm: 50, burst: 5 }
    tokenQuota: { daily: 5000 }

  - name: production
    apiKeys:
      - key: YOUR-PROD-OPENAI-KEY
        label: Production OPENAI KEY
      - key: YOUR-PROD-CLAUDE-KEY
        label: Production CLAUDE KEY
    permissions:
      - provider: openai
        models: ["gpt-3.5-turbo", "gpt-4", "gpt-4-turbo"]
      - provider: anthropic
        models: ["claude-3-haiku", "claude-3-sonnet", "claude-3-opus"]
    rateLimit: { rpm: 200, burst: 20 }
    tokenQuota: { daily: 20000 }

Observability & Metrics

Brighteye ships with a rich set of Prometheus metrics on port 6380:

brighteye_requests_total{provider,model}
brighteye_blocked_prompt_total
brighteye_token_usage_total{api_key,model,group}
brighteye_upstream_errors_total{provider}
brighteye_response_time_seconds_bucket{provider,model}
brighteye_rate_limit_hits_total{api_key}
brighteye_quota_usage{api_key}
brighteye_active_connections
brighteye_request_size_bytes_bucket{provider}

Getting Started

Docker Quick-Start

docker pull ghcr.io/mehmetymw/brighteye:latest

docker run -d \
  --name brighteye \
  -p 6789:6789 \
  -v $(pwd)/brighteye.yaml:/app/brighteye.yaml \
  ghcr.io/mehmetymw/brighteye:latest

Sample Request

curl -X POST http://localhost:6789/chat/completions?provider=openai \
  -H 'Authorization: Bearer YOUR-KEY' \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello!"}]}'

Use Cases

Dev & Test Environments: Isolate non-prod workloads with limited quotas.
Production Services: Guarantee SLAs with higher quotas and strict rate limits.
Security Audits: Enforce prompt filters to prevent accidental leaks.
Cost Monitoring: Centralize token usage metrics and alert on overages.

Why Brighteye?

No Code Changes: Works as a drop-in proxy.
Policy as Code: All rules in a single YAML file.
Scalable & Extensible: Add providers or tweak policies on the fly.

Brighteye brings order, security, and clarity to your LLM-driven apps. Try it today and let every request pass your eye first!