Founder notes

The Full Stack

Every service, every pattern, real cost architecture. Built for engineers, CTOs, and founders evaluating AI infrastructure.

Building Collab365 Spaces

The business changed. So we rebuilt the system.

1Before

14 years

Legacy training business

2Pressure

18 months

AI changed the ground

3After

Spaces

A rebuilt intelligence engine

Raw founder notes/no vendor theatre, no recycled AI hype

Collab365

Mark Jones · Collab365

Section 01

The Cloudflare Platform

Every service runs inside Cloudflare's platform. One bill. One deployment pipeline. Zero servers. This is not a managed Kubernetes cluster with 12 microservices. It is 6 primitives doing exactly the job they were designed for.

Cloudflare WorkersRuntime

Every API route, every webhook handler. Cold-start under 5ms globally. No servers, no containers, no EC2 instances. The entire backend runs at the edge.

Docs

Cloudflare WorkflowsDurable Orchestration

Long-running AI jobs as durable, resumable workflows. A 13-stage Recipe pipeline that fails at stage 9 resumes from stage 9. Not from the beginning. Each step is independently retryable.

Docs

Cloudflare D1Relational Database

SQLite at the edge. Stores posts, spaces, members, subscriptions, entitlements, and all structured metadata. Drizzle ORM for type-safe queries. Full SQL at < 1ms latency.

Docs

Cloudflare R2Object Storage

Every AI-generated document, every avatar profile, every research artefact, every course. Zero egress fees. The Foundation Layer (avatar + research) is cached here and reused by every downstream workflow.

Docs

Cloudflare VectorizeVector Database

Powers the Space Copilot's AutoRAG retrieval. Every document ingested via the Oracle extension is chunked, embedded, and indexed here. Semantic search before generation. Zero hallucination.

Docs

Cloudflare AI GatewayModel Observability

Every AI API call (Gemini, Claude, Perplexity, Workers AI) routes through AI Gateway. Full token counting, cost attribution, latency tracing, and prompt caching per request.

Docs

Section 02

Model Governance

Zero hardcoded model IDs anywhere in the codebase. Every AI call requests a role, not a model name. A single settings row maps each role to a live model ID. Swapping Gemini 2.0 → 2.5 across every workflow takes 30 seconds. No deploys.

RoleCurrent ModelPrimary Use CasesCost Tier

thinkingGemini 2.5 ProAgent reasoning, complex synthesis, avatar creationCost: High

utilityGemini 2.0 FlashSEO generation, quick transformations, structured outputCost: Low

research.deepPerplexity Sonar DeepMarketplace intelligence, paid frustration audit, blocker discoveryCost: Medium

imageCloudflare Workers AI (Flux)AI-generated cover images, post thumbnailsCost: Low

The practical payoffWhen Gemini 2.5 Pro launched and benchmarked significantly better, we updated one database row. Every Foundation Layer workflow - Avatar, Briefing, Recipe - immediately used the better model. Teams with hardcoded model IDs ran grep + PR review + staging + deploy cycles.

Section 03

Pattern Selection Guide

The biggest waste in AI engineering is using an agent when a direct call would do, or a direct call when a pipeline is required. This is how we select a pattern for any new feature.

Adaptive Agent (ReAct Loop)

Use when

Goal-oriented work requiring autonomous tool use across multiple steps

Used in

Avatar Creator, Briefing Creator, Space Suggestion

Cost profile

Highest. 16 reasoning rounds, 10k token cap, 6 web searches maximum

Not for

Never used on hot paths. Foundation Layer only.

Perplexity Sonar Deep Research

Use when

Single-shot marketplace intelligence requiring 50+ internal searches

Used in

Paid Frustration Audit, Blocker Discoverer

Cost profile

Medium. Single API call, Perplexity handles search orchestration internally

Not for

Poor fit for tasks requiring structured tool calls back to our own APIs

Multi-Stage Pipeline (Cloudflare Workflows)

Use when

Complex multi-step generation where each stage has independent quality gates

Used in

Recipe Generator (13 stages), Tutorial Generator (7 stages), Knowledge Ingestion

Cost profile

Medium. Stage-level token budgets prevent runaway spend. Each stage is individually observable

Not for

Overkill for sub-3-step tasks. Direct calls are cheaper and simpler

Direct AI Call

Use when

Well-defined, single-prompt task with deterministic output

Used in

SEO metadata, display identity suggestion, post improvement, image prompt generation

Cost profile

Lowest. Temp 0.1–0.3, JSON schema enforcement, 1–2k token cap

Not for

Anything requiring tool use, web search, or multi-step reasoning

AutoRAG (Cloudflare)

Use when

Member-facing Q&A where hallucination is not acceptable

Used in

Space Copilot

Cost profile

Low. Vector search is cheap. Generation is grounded, so responses are short

Not for

Tasks where the answer might not be in the knowledge base. Agent is better

Section 04

Cost Architecture

The single biggest cost control in the system is the Foundation Layer cache. Here is how it works:

🧠

Foundation Layer (once)

Avatar + Micro-Niche + Paid Frustration runs once per Space. Cached in R2 indefinitely. ~$0.08–$0.15 per Space generated. Amortises across every downstream content piece.

⚡

Production Layer (per-run)

Briefings, Blockers, Recipes read from the cached Foundation. Perplexity Deep Research per blocker: ~$0.04. Briefing Creator agent per run: ~$0.02–$0.06. Recipe Generator 13-stage: ~$0.10–$0.18.

💬

Interaction Layer (per-query)

Copilot queries via AutoRAG: sub-cent. Vector search is the cheapest compute in the stack. Rate-limited to 3/day per member as a cost + UX lever simultaneously.

// Token budget enforcement - every stage

Avatar Agent: maxRounds: 16 maxTokens: 100,000 maxWeb: 6

Briefing Agent: maxRounds: 10 maxTokens: 80,000 maxWeb: 8

Recipe Stage (each): maxTokens: 8,000 temp: 0.2–0.6

Direct Call (SEO): maxTokens: 1,500 temp: 0.2

Building something similar?

We consult with engineering teams who want to build AI pipelines on Cloudflare. We've made every mistake documented above, and rebuilt it right.

Let's Talk Architecture