How It Works
A fully autonomous compliance intelligence pipeline — no humans in the loop for daily operations.
Web3 Compliance AI is not a traditional research firm. There are no analysts writing reports. Instead, a fleet of GPU-powered AI workers autonomously research regulations, verify facts against primary legislation, monitor source changes, and publish updates — all at minimal API costs (~$0.10/day).
The Pipeline
Research
AI workers receive research tasks via the HTTP API at research-api.justfixit.ai — dispatching to a fleet of 4 self-hosted GPU servers. Each task runs through the worker-farm's adversarial pipeline (SearXNG web search + multi-model challenge loops) to research a specific jurisdiction's regulations, citing official sources.
Pipeline: Generate → Challenge → Revise → Score → Verify
Fact Extraction
Research output is parsed into individual fact records — each with a specific claim, source URL, jurisdiction, and topic. One research file might produce 5-15 distinct facts about regulators, laws, licensing requirements, and tax treatment.
Source: https://bcb.gov.br → Confidence: 0.5 (unverified)
Verification
Every week, an automated validator checks every fact's source URL. If the URL returns HTTP 200 and the content hasn't changed, the fact is marked verified with increased confidence. Dead links get flagged and automatically re-researched. Additionally, the worker-farm's 5-stage adversarial pipeline (Generate → Challenge → Revise → Score → Verify) ensures research quality before facts are ever ingested.
Publishing
A 27-stage build pipeline runs 3x/day via GitHub Actions (04:00, 09:00, 16:00 UTC). The pipeline extracts articles, repairs facts, grades evidence, rebuilds country profiles, regenerates the JSON API, and deploys to Cloudflare's global edge network (200+ data centers). Zero manual intervention.
Monitoring
156 official regulatory source URLs are checked daily using SHA-256 content hashing. When a regulator updates their guidance, the system detects the change and queues new research to update affected facts.
Self-Healing
If research goes stale (no new data in 7 days), the system automatically requeues research — up to 2,000 tasks/cycle. Bad files are quarantined and rescued at build time. Failed builds auto-retry or revert. Source URLs that fail 3 times are suspended. No human intervention needed.
The Infrastructure
GPU Workers
Four self-hosted nodes (Lab-1 through Lab-4) with GPUs running 24/7. Handle research, fact extraction, validation, and inference. Connected via Tailscale mesh network.
Worker-Farm API
HTTP API at research-api.justfixit.ai dispatches tasks to the GPU fleet. 5-stage adversarial pipeline (Generate → Challenge → Revise → Score → Verify). Rate limits: 10,000 tasks/day.
LiteLLM Proxy
Routes AI requests to 10+ providers via SearXNG web search: Gemini Flash, DeepSeek, Groq, Cerebras, OpenRouter, Ollama (local GPU), and more. Worker-farm handles all model routing.
Cloudflare Edge
Site served from 200+ global PoPs. D1 database for MCP queries. KV for caching. Workers for API and MCP server. Cloudflare free tier for hosting.
MCP Server
AI assistants (Claude, GPT, etc.) can query all compliance data via the Model Context Protocol. 7 tools + 10 resources at mcp.web3compliance.ai.
Fleet Brain
GCP e2-micro (free tier) orchestrates the worker fleet. Detects new commits, triggers builds, monitors health, and dispatches scheduled tasks.
By The Numbers
Why This Approach
Traditional compliance data requires teams of analysts, manual research, and enterprise pricing ($50K-100K/year). Updates are periodic. Coverage is limited to what the team can handle.
Our approach uses autonomous AI workers that never sleep, never take vacations, and research every jurisdiction simultaneously. The cost is self-hosted GPU hardware, ~$30/mo electricity, and ~$3-8/mo in API costs. The result is the same data — sourced from the same official regulators — but updated daily instead of quarterly, at a fraction of the cost.
The tradeoff: AI-extracted facts start at 0.5 confidence ("unverified"). They become trustworthy through automated verification against source URLs — the same URLs a human analyst would check. We're transparent about what's verified and what isn't. Every fact shows its confidence score and source.