The Full Stack
Every service, every pattern, real cost architecture. Built for engineers, CTOs, and founders evaluating AI infrastructure.
The Cloudflare Platform
Every service runs inside Cloudflare's platform. One bill. One deployment pipeline. Zero servers. This is not a managed Kubernetes cluster with 12 microservices. It is 6 primitives doing exactly the job they were designed for.
Every API route, every webhook handler. Cold-start under 5ms globally. No servers, no containers, no EC2 instances. The entire backend runs at the edge.
DocsLong-running AI jobs as durable, resumable workflows. A 13-stage Recipe pipeline that fails at stage 9 resumes from stage 9. Not from the beginning. Each step is independently retryable.
DocsSQLite at the edge. Stores posts, spaces, members, subscriptions, entitlements, and all structured metadata. Drizzle ORM for type-safe queries. Full SQL at < 1ms latency.
DocsEvery AI-generated document, every avatar profile, every research artefact, every course. Zero egress fees. The Foundation Layer (avatar + research) is cached here and reused by every downstream workflow.
DocsPowers the Space Copilot's AutoRAG retrieval. Every document ingested via the Oracle extension is chunked, embedded, and indexed here. Semantic search before generation. Zero hallucination.
DocsEvery AI API call (Gemini, Claude, Perplexity, Workers AI) routes through AI Gateway. Full token counting, cost attribution, latency tracing, and prompt caching per request.
DocsModel Governance
Zero hardcoded model IDs anywhere in the codebase. Every AI call requests a role, not a model name. A single settings row maps each role to a live model ID. Swapping Gemini 2.0 → 2.5 across every workflow takes 30 seconds. No deploys.
thinkingGemini 2.5 ProAgent reasoning, complex synthesis, avatar creationCost: HighutilityGemini 2.0 FlashSEO generation, quick transformations, structured outputCost: Lowresearch.deepPerplexity Sonar DeepMarketplace intelligence, paid frustration audit, blocker discoveryCost: MediumimageCloudflare Workers AI (Flux)AI-generated cover images, post thumbnailsCost: LowPattern Selection Guide
The biggest waste in AI engineering is using an agent when a direct call would do, or a direct call when a pipeline is required. This is how we select a pattern for any new feature.
Adaptive Agent (ReAct Loop)
Perplexity Sonar Deep Research
Multi-Stage Pipeline (Cloudflare Workflows)
Direct AI Call
AutoRAG (Cloudflare)
Cost Architecture
The single biggest cost control in the system is the Foundation Layer cache. Here is how it works:
Building something similar?
We consult with engineering teams who want to build AI pipelines on Cloudflare. We've made every mistake documented above, and rebuilt it right.
Let's Talk Architecture