Founder notes

The Full Stack

Every service, every pattern, real cost architecture. Built for engineers, CTOs, and founders evaluating AI infrastructure.

Building Collab365 Spaces
The business changed. So we rebuilt the system.
1Before
14 years
Legacy training business
2Pressure
18 months
AI changed the ground
3After
Spaces
A rebuilt intelligence engine
Raw founder notesno vendor theatre, no recycled AI hype
Collab365
MJ
Mark Jones · Collab365
Section 01

The Cloudflare Platform

Every service runs inside Cloudflare's platform. One bill. One deployment pipeline. Zero servers. This is not a managed Kubernetes cluster with 12 microservices. It is 6 primitives doing exactly the job they were designed for.

Cloudflare WorkersRuntime

Every API route, every webhook handler. Cold-start under 5ms globally. No servers, no containers, no EC2 instances. The entire backend runs at the edge.

Docs
Cloudflare WorkflowsDurable Orchestration

Long-running AI jobs as durable, resumable workflows. A 13-stage Recipe pipeline that fails at stage 9 resumes from stage 9. Not from the beginning. Each step is independently retryable.

Docs
Cloudflare D1Relational Database

SQLite at the edge. Stores posts, spaces, members, subscriptions, entitlements, and all structured metadata. Drizzle ORM for type-safe queries. Full SQL at < 1ms latency.

Docs
Cloudflare R2Object Storage

Every AI-generated document, every avatar profile, every research artefact, every course. Zero egress fees. The Foundation Layer (avatar + research) is cached here and reused by every downstream workflow.

Docs
Cloudflare VectorizeVector Database

Powers the Space Copilot's AutoRAG retrieval. Every document ingested via the Oracle extension is chunked, embedded, and indexed here. Semantic search before generation. Zero hallucination.

Docs
Cloudflare AI GatewayModel Observability

Every AI API call (Gemini, Claude, Perplexity, Workers AI) routes through AI Gateway. Full token counting, cost attribution, latency tracing, and prompt caching per request.

Docs
Section 02

Model Governance

Zero hardcoded model IDs anywhere in the codebase. Every AI call requests a role, not a model name. A single settings row maps each role to a live model ID. Swapping Gemini 2.0 → 2.5 across every workflow takes 30 seconds. No deploys.

RoleCurrent ModelPrimary Use CasesCost Tier
thinkingGemini 2.5 ProAgent reasoning, complex synthesis, avatar creationCost: High
utilityGemini 2.0 FlashSEO generation, quick transformations, structured outputCost: Low
research.deepPerplexity Sonar DeepMarketplace intelligence, paid frustration audit, blocker discoveryCost: Medium
imageCloudflare Workers AI (Flux)AI-generated cover images, post thumbnailsCost: Low
The practical payoffWhen Gemini 2.5 Pro launched and benchmarked significantly better, we updated one database row. Every Foundation Layer workflow - Avatar, Briefing, Recipe - immediately used the better model. Teams with hardcoded model IDs ran grep + PR review + staging + deploy cycles.
Section 03

Pattern Selection Guide

The biggest waste in AI engineering is using an agent when a direct call would do, or a direct call when a pipeline is required. This is how we select a pattern for any new feature.

01

Adaptive Agent (ReAct Loop)

Use when
Goal-oriented work requiring autonomous tool use across multiple steps
Used in
Avatar Creator, Briefing Creator, Space Suggestion
Cost profile
Highest. 16 reasoning rounds, 10k token cap, 6 web searches maximum
Not for
Never used on hot paths. Foundation Layer only.
02

Perplexity Sonar Deep Research

Use when
Single-shot marketplace intelligence requiring 50+ internal searches
Used in
Paid Frustration Audit, Blocker Discoverer
Cost profile
Medium. Single API call, Perplexity handles search orchestration internally
Not for
Poor fit for tasks requiring structured tool calls back to our own APIs
03

Multi-Stage Pipeline (Cloudflare Workflows)

Use when
Complex multi-step generation where each stage has independent quality gates
Used in
Recipe Generator (13 stages), Tutorial Generator (7 stages), Knowledge Ingestion
Cost profile
Medium. Stage-level token budgets prevent runaway spend. Each stage is individually observable
Not for
Overkill for sub-3-step tasks. Direct calls are cheaper and simpler
04

Direct AI Call

Use when
Well-defined, single-prompt task with deterministic output
Used in
SEO metadata, display identity suggestion, post improvement, image prompt generation
Cost profile
Lowest. Temp 0.1–0.3, JSON schema enforcement, 1–2k token cap
Not for
Anything requiring tool use, web search, or multi-step reasoning
05

AutoRAG (Cloudflare)

Use when
Member-facing Q&A where hallucination is not acceptable
Used in
Space Copilot
Cost profile
Low. Vector search is cheap. Generation is grounded, so responses are short
Not for
Tasks where the answer might not be in the knowledge base. Agent is better
Section 04

Cost Architecture

The single biggest cost control in the system is the Foundation Layer cache. Here is how it works:

🧠
Foundation Layer (once)
Avatar + Micro-Niche + Paid Frustration runs once per Space. Cached in R2 indefinitely. ~$0.08–$0.15 per Space generated. Amortises across every downstream content piece.
Production Layer (per-run)
Briefings, Blockers, Recipes read from the cached Foundation. Perplexity Deep Research per blocker: ~$0.04. Briefing Creator agent per run: ~$0.02–$0.06. Recipe Generator 13-stage: ~$0.10–$0.18.
💬
Interaction Layer (per-query)
Copilot queries via AutoRAG: sub-cent. Vector search is the cheapest compute in the stack. Rate-limited to 3/day per member as a cost + UX lever simultaneously.
// Token budget enforcement - every stage
Avatar Agent: maxRounds: 16 maxTokens: 100,000 maxWeb: 6
Briefing Agent: maxRounds: 10 maxTokens: 80,000 maxWeb: 8
Recipe Stage (each): maxTokens: 8,000 temp: 0.2–0.6
Direct Call (SEO): maxTokens: 1,500 temp: 0.2

Building something similar?

We consult with engineering teams who want to build AI pipelines on Cloudflare. We've made every mistake documented above, and rebuilt it right.

Let's Talk Architecture