From wedge to fleet. In five phases.
Narrow proof before horizontal expansion. Discipline over ambition. Compounding moats over static ones.
We do not build a broad platform before proving a single wedge. That is the commonest failure mode of enterprise AI companies. The sequence below is the one we actually follow. Status is live; updated per merged pull request.
Prove one wedge first
We do not promise you a fleet of specialists. We prove the loop on one narrow, strategic use case.
Compounding over static
When forced to choose, we favour work that compounds — failure analysis, safety-net accumulation, fleet intelligence — over static sophistication.
The moat lives above the substrate
We do not try to out-Groq Groq. We sit above the substrate where infrastructure improvements benefit us without commoditising us.
Phase 0
Proof discipline
Goal
Convert promising results into evidence you can trust.
Deliverables
- ›Canonical head-to-head benchmark suite for the chosen wedge
- ›Production-like latency and cost benchmarks on the target serving substrate
- ›Rubric review for leakage, overfitting, and narrow-task distortion
- ›Explicit baseline set: frontier flagship, frontier lightweight, current internal
- ›Competitive benchmark framing vs Groq, Fireworks, Cerebras
- ›Explicit build-vs-buy framing
Success measures
- ✓You can explain exactly why the specialist wins
- ✓Benchmark results reproducible by someone other than the author
- ✓Clear separation between interesting experiment and validated signal
Phase 1
Wedge selection and data readiness
Goal
Choose one specialist that is commercially important and operationally feasible.
Deliverables
- ›Ranked shortlist of candidate specialists with explicit selection criteria
- ›Data-readiness review for the winning wedge
- ›Initial deployment plan in your product flow
- ›Substrate strategy: owned training, hybrid, or vendor-backed execution
- ›Primary agent workflow: what the coding agent sees, decides, triggers
- ›Internal build-vs-buy memo for the wedge
- ›Adtech domain knowledge base with authoritative sources
Success measures
- ✓One narrow wedge chosen and defended
- ✓Training and eval data sources identified and accessible
- ✓Product owner and technical owner aligned on the same use case
- ✓Adtech KB seeded with ≥500 documents from authoritative sources
Phase 2
First specialist to production
Goal
Ship your first specialist as a real product capability, not an isolated model demo.
Deliverables
- ›Retraining loop for the selected specialist
- ›Accepted eval and promotion gates
- ›Explicit promotion ladder from candidate to production-accepted
- ›Stable serving interface
- ›Monitoring for latency, failure modes, and quality drift
- ›Fallback path to existing model/runtime
- ›Substrate abstraction separating training and production inference
- ›Operating surface for your coding agents: resources, commands, policy boundaries
- ›Human-in-the-loop approval points for judgement gates
Success measures
- ✓Specialist beats the agreed frontier baseline on trusted evals
- ✓Latency low enough to unlock the target workflow
- ✓Specialist actually used in a bounded production or pilot environment
- ✓Promotion decisions explicit, auditable, tied to evidence artefacts
Phase 3
Commercial validation
Goal
Prove that the specialist creates real business value.
Deliverables
- ›A/B or shadow-mode measurement plan
- ›Feedback loop for your operators
- ›Commercial KPI mapping for the wedge
- ›Failure review process that turns misses into training data
- ›Dataset, rubric, and policy promotion process
- ›Value narrative legible to stakeholders, not just builders
Success measures
- ✓Measurable improvement in a business-relevant KPI
- ✓Acceptable operational burden
- ✓Feedback captured as structured evidence, not anecdote
- ✓Leadership confidence beyond research asset
Phase 4
Control plane
Goal
Turn your first specialist workflow into a repeatable platform capability.
Deliverables
- ›Typed orchestration for eval, training, promotion, rollback, recovery
- ›Accepted model lineage and evidence history
- ›Dataset and reward provenance
- ›Dashboards and machine-readable health surfaces
- ›Policy boundaries for autonomous operation
- ›Runtime abstraction separating control logic from substrate
- ›Stable MCP and agent-tool interfaces
- ›Coherent access modes: CLI, API, MCP, dashboard
Success measures
- ✓You can create another specialist without reinventing the operating model
- ✓Promotion is explicit, auditable, reversible
- ✓Key decisions no longer depend on undocumented human memory
Phase 5
Specialist fleet expansion
Goal
Expand from one proven wedge to a coordinated set of specialists.
Deliverables
- ›Expansion plan across adjacent specialist contexts
- ›Specialist registry and deterministic routing logic
- ›Shared interfaces for invocation, fallback, composition
- ›Domain-specific eval packs for each new specialist
- ›Partner/substrate strategy for external inference when useful
Success measures
- ✓At least two additional specialists reach the same proof standard
- ✓Specialists compose cleanly in larger agent workflows
- ✓Infrastructure remains manageable as fleet size grows
Decision rule
How we choose what to build next.
When choosing between roadmap items, we prefer the one that most improves, in order:
- 01Trusted evidence that the moat is real
- 02Time to first production specialist
- 03Repeatability of retraining and promotion
- 04Institutionalisation of knowledge
- 05Expansion readiness for a specialist fleet
If a task makes the system more sophisticated but does not improve those five, it is probably not a priority.
Prioritise compounding moats over static moats.