Cymba Labs background
← Blog

Engineering · 2026-03-25

Built to Ship — Encoding Engineering Discipline into an AI Pipeline

How a 10-skill AI development pipeline handles the full software development lifecycle — from daily standup to shipped PR — and why the hard part was never the AI.

There is a growing divide in how developers use AI. On one side, you have what the internet has started calling "vibe coding" — paste a rough idea into an LLM, accept whatever it generates, ship it, and hope for the best. No tests. No review. No architecture. Just vibes.

On the other side, there is something quieter and less viral: AI-augmented engineering. Same disciplined process that senior developers have followed for years — sprint planning, ticket writing, code review, build gates, security checks — but with AI handling the mechanical execution at a throughput no single person could match.

The difference is not the AI. The difference is the process that wraps it.

At Cymba Labs, we ship production SaaS applications, client projects, and internal tools across a monorepo with 10+ applications. We do this with a development pipeline built on Claude Code — Anthropic's CLI tool for AI-powered development — and a set of 10 custom skills that automate every stage of the development lifecycle.

This is not a story about replacing engineering judgment with AI. It is about encoding that judgment into a repeatable system, and then letting AI execute it at scale.

The Laser Level Analogy

Think about a contractor measuring cuts for a kitchen renovation. A skilled contractor who eyeballs measurements might get it right most of the time. A skilled contractor with a laser level gets it right every time, faster, with less cognitive load on the mechanical parts of the job.

The laser level did not make the contractor less skilled. It freed them to focus on the decisions that actually require expertise — the design, the sequencing, the judgment calls. The measurement itself was never where the value lived.

AI-augmented engineering works the same way. The AI handles the mechanical execution — scaffolding code, running through checklists, formatting tickets, generating boilerplate. The developer handles the architecture, the trade-offs, the edge cases, and the "should we even build this" decisions. The discipline comes before the AI. If your process was sloppy before AI, AI will just make it sloppy faster.

The Pipeline

Every skill in this system maps to something a good engineering team already does. The AI did not invent a new process — it automated an existing one.

  ┌─────────────────────────────────────────────┐
  │             PLANNING PHASE                  │
  │                                             │
  │   /kickoff ──→ /plan-sprint                 │
  │       │              │                      │
  │       ▼              ▼                      │
  │   /spec ──→ /enhance ──→ /decompose         │
  └─────────────────────┬───────────────────────┘
                        │
                        ▼
  ┌─────────────────────────────────────────────┐
  │           EXECUTION PHASE                   │
  │                                             │
  │   /parallel-implement                       │
  │       │         │         │                 │
  │       ▼         ▼         ▼                 │
  │   Worktree  Worktree  Worktree              │
  │   CYM-31   CYM-32    CYM-33                │
  │       │         │         │                 │
  │       ▼         ▼         ▼                 │
  │   /implement  /implement  /implement        │
  └─────────────────────┬───────────────────────┘
                        │
                        ▼
  ┌─────────────────────────────────────────────┐
  │            QUALITY PHASE                    │
  │                                             │
  │   /review ──→ Build/Lint ──→ PR ──→ Merge   │
  └─────────────────────────────────────────────┘

Sitting above all of this is /skill-creator — a meta-skill that builds and evaluates the other skills. We use evaluation workspaces to A/B test skill quality: run the same prompt with and without a skill, compare outputs, and iterate. When a repeated workflow pattern emerges — the same instructions given to Claude Code across multiple sessions — that is when a new skill gets built.

Each skill is defined in a single SKILL.md file with structured frontmatter. They are version-controlled alongside the code they operate on. This is not a separate tooling layer — it lives in the repo.

Planning — Where the Real Work Happens

This is where the most distance opens between vibe coding and serious engineering. Before a single line of code is written, the work goes through structured planning that turns ambiguity into precision.

From Rough Idea to Implementation-Ready Ticket

/spec is a conversational skill that turns a rough idea into a structured Linear ticket. It does not fill out a form — it runs a targeted Q&A to identify what is clear, what is ambiguous, and what is missing. The output is a ticket with acceptance criteria precise enough that /implement can pick it up without asking clarifying questions.

Here is what this looks like in practice. Starting with something vague:

/spec I want to add a waitlist feature to CourtSight
so people can join a waitlist when a court is fully
booked for a timeslot

The skill does not immediately write a ticket. It asks focused questions — which app, what happens when a spot opens, should there be notifications, is there a cap on waitlist size, what does the user see. After 2-3 rounds of Q&A, it produces a structured ticket:

## Overview
Allow players to join a waitlist when a court timeslot
is fully booked. When a cancellation opens a spot,
notify the next player in queue and hold the slot
for a configurable window.

## Acceptance Criteria
- [ ] "Join Waitlist" button appears on fully booked
      timeslots
- [ ] Waitlist respects FIFO ordering
- [ ] Notification sent via WhatsApp when a spot opens
- [ ] Held slot expires after 15 minutes if unclaimed
- [ ] Player can leave the waitlist voluntarily
- [ ] Admin can view and manage waitlist per timeslot

## Technical Approach
- App/Package: apps/courtsight
- Data Model: new `waitlist_entries` table with
  position, status, expiry fields
- Key Components: WaitlistButton, WaitlistPanel,
  waitlist API routes
- Integrations: Twilio WhatsApp (existing pattern
  in lib/whatsapp/)

## Edge Cases & Considerations
- Concurrent cancellations: two spots open
  simultaneously
- Player is already booked for overlapping timeslot
- WhatsApp notification delivery failure — fallback?

## Out of Scope
- Email notifications (WhatsApp only for v1)
- Waitlist analytics/reporting
- Cross-venue waitlisting

The ticket references actual codebase patterns — lib/whatsapp/ for notifications, the existing booking data model for schema design. This is possible because the skill explores the codebase before asking questions. It is not just a prompt template.

Breaking Large Work into Parallel Waves

When a ticket is too large for a single PR, /decompose breaks it into child tasks organized into dependency waves:

# Decomposition: CYM-47 — Waitlist Feature

## Wave 1 — Foundation (3 tasks, all parallel)

  Task 1: Database migration + types
  Complexity: S | apps/courtsight
  - Create waitlist_entries table with RLS policies
  - Generate Supabase types

  Task 2: Core waitlist API routes
  Complexity: M | apps/courtsight
  - POST /api/waitlist (join)
  - DELETE /api/waitlist/:id (leave)
  - GET /api/waitlist/:timeslot (list)
  - Server-side validation + auth

  Task 3: Admin waitlist management panel
  Complexity: M | apps/courtsight
  - WaitlistPanel component on admin timeslot view
  - Reorder / remove capabilities

## Wave 2 — Integration (2 tasks, after Wave 1)

  Task 4: WhatsApp notification on spot open
  Complexity: M | apps/courtsight
  Depends on: Task 1, Task 2
  - Hook into booking cancellation flow
  - Use existing Twilio WhatsApp queue

  Task 5: Player-facing waitlist UI
  Complexity: M | apps/courtsight
  Depends on: Task 1, Task 2
  - "Join Waitlist" button on booked timeslots
  - Waitlist position indicator
  - Leave waitlist action

## Coverage Check
- [x] Join waitlist → Task 2, Task 5
- [x] FIFO ordering → Task 2
- [x] WhatsApp notification → Task 4
- [x] Slot hold + expiry → Task 2, Task 4
- [x] Leave waitlist → Task 2, Task 5
- [x] Admin management → Task 3

The key principles: vertical slices over horizontal layers, each task produces a mergeable PR, and tasks in the same wave do not modify the same files. That last point matters because Wave 1 tasks will run in parallel across separate git worktrees.

/plan-sprint does something similar but at the backlog level — it reads the entire Linear backlog and the codebase state, then produces a prioritized sprint plan of 6-10 tasks organized into waves. /kickoff is the daily standup equivalent — it reviews overnight PR activity, identifies stale work, checks build health, and suggests what to tackle today.

Parallel Execution — The Throughput Multiplier

This is the part that changes the economics of a small team. Instead of implementing tickets sequentially, multiple tickets run simultaneously in isolated environments.

/parallel-implement takes a set of independent ticket IDs and, for each one, creates a separate git worktree branched from main and launches a dedicated terminal window with its own Claude Code instance running /implement.

The worktree creation ensures true isolation:

mkdir -p .worktrees

git worktree add .worktrees/CYM-31 \
  -b CYM-31-add-booking-validation main

git worktree add .worktrees/CYM-32 \
  -b CYM-32-update-pricing-page main

git worktree add .worktrees/CYM-33 \
  -b CYM-33-fix-inbox-race main

Each worktree is a full copy of the repo on its own branch. No shared state between them.

Then, for each ticket, a launcher script is generated and a new terminal window opens:

SCRIPT="/tmp/claude-parallel-CYM-31.sh"
cat > "$SCRIPT" << 'LAUNCH_EOF'
#!/bin/bash
cd /path/to/repo/.worktrees/CYM-31
echo "=================================="
echo " Implementing CYM-31"
echo " Worktree: .worktrees/CYM-31"
echo "=================================="
claude "/implement CYM-31"
echo "Implementation complete."
exec bash
LAUNCH_EOF
chmod +x "$SCRIPT"

open -na Ghostty.app --args --command="bash $SCRIPT"

Within each terminal, /implement runs the full lifecycle autonomously:

  1. Reads the Linear ticket and any parent issues for context
  2. Updates Linear status to "In Progress"
  3. Implements the change, following codebase conventions from CLAUDE.md
  4. Runs pnpm build and pnpm lint — stops immediately on failure
  5. Spawns a self-review subagent (more on this below)
  6. Commits, pushes, and creates a PR via gh pr create
  7. Updates Linear status to "In Review" and links the PR

Three to four tickets running simultaneously, each producing a reviewed PR, typically in under 15 minutes per ticket. In a single work session, a planned sprint wave becomes three or four open PRs ready for human review.

The self-review step inside /implement is particularly important to call out. Before pushing, a separate AI subagent reviews the diff against main, checking for bugs, security issues, type safety regressions, convention violations, and whether the changes actually fulfill the ticket requirements. If it finds blocking issues, the implementation agent fixes them before proceeding. This is the same code review process a team would do — it just happens before the PR is even created.

Quality Gates

Shipping fast without quality gates is just shipping bugs fast. The pipeline has multiple checkpoints that prevent bad code from reaching main.

Self-Review in /implement

Every /implement run includes a Phase 4 self-review. A separate subagent analyzes the diff and produces a structured verdict — PASS or NEEDS CHANGES — with specific findings:

Findings:
- [BLOCKING] app/api/waitlist/route.ts:34
  Missing auth check on POST handler.
  All API routes must verify session.
- [SUGGESTION] components/waitlist-button.tsx:12
  Consider adding loading state during
  API call to prevent double-submission.

Verdict: NEEDS CHANGES (1 blocking)

Blocking findings are fixed before the code is pushed. Suggestions are noted in the PR description. No code reaches the PR stage without passing this review.

The /review Skill

For changes that happen outside of /implement — manual work, exploratory features, or multi-step refactors — /review provides the same structured analysis on demand. It checks:

  • TypeScript type safetyany usage, missing return types, unsafe assertions, unchecked index access (strict mode with noUncheckedIndexedAccess)
  • Security — secrets in code, SQL injection vectors, XSS via unescaped input, missing auth on routes
  • Error handling — unhandled promise rejections, empty catch blocks, missing error boundaries
  • Shared package impact — changes to packages/ that affect consuming apps, breaking API changes
  • Convention compliance — import patterns, color format (oklch), naming conventions from CLAUDE.md

Every finding references a specific file and line number. The output includes a verdict: READY TO MERGE, NEEDS CHANGES, or NEEDS DISCUSSION.

Build and Lint Gates

Both /implement and /review enforce hard stops on build or lint failures. If pnpm build --filter= fails, the process halts and reports the error. There is no "skip the build and push anyway" path. This is enforced by the skill definition itself — the instruction says "if the build fails, stop and report the error. Do not proceed."

This matters because these are not optional guidelines — they are encoded into the system. A developer running /implement at 2am gets the same quality gates as one running it at 10am. The discipline is in the tooling, not the developer's willpower.

Building Skills That Build Software

A common question: how do you build skills for something like this?

Claude Code has a /skill-creator meta-skill specifically for this. Skills are defined in SKILL.md files with structured frontmatter (name, description, triggers) and a detailed body that describes the process step by step. They live in .claude/skills/ in the repo and are version-controlled alongside the code.

The skill definition is essentially a detailed operating procedure — the same kind of runbook a senior engineer would write for a recurring process. The difference is that instead of a human reading and executing the runbook, Claude Code reads and executes it.

.claude/skills/
├── spec/SKILL.md
├── enhance/SKILL.md
├── decompose/SKILL.md
├── implement/SKILL.md
├── parallel-implement/SKILL.md
├── plan-sprint/SKILL.md
├── kickoff/SKILL.md
├── review/SKILL.md
├── cymba-designer/SKILL.md
└── skill-creator/SKILL.md

Iterating on Skill Quality

Building a skill is step one. Making it reliably good is step two — and honestly, this is an area where we are still building out the process.

The pipeline includes evaluation workspaces for three skills (/spec, /decompose, /enhance). Each workspace contains test scenarios with expected outputs. For /spec, the test prompts range from intentionally vague ("add a waitlist feature") to highly detailed ("here are exact requirements for a tournament bracket system"). The eval runs the skill against each prompt and compares the output to a baseline run without the skill.

spec-workspace/iteration-1/
├── eval-1-waitlist-vague/
│   ├── with_skill/outputs/
│   └── without_skill/outputs/
├── eval-2-tournament-detailed/
│   ├── with_skill/outputs/
│   └── without_skill/outputs/
└── eval-3-email-notifications/
    ├── with_skill/outputs/
    └── without_skill/outputs/

This infrastructure exists and works. The next step is making it a regular practice — running evals after skill changes, tracking quality metrics over time, and expanding eval coverage to higher-stakes skills like /implement and /review.

The signal for when to create a new skill is simple: if the same instructions are given to Claude Code more than twice, those instructions should be a skill. The signal for when to improve a skill is also simple: if a skill produces output that consistently needs editing or overriding, the skill needs iteration.

What Is Missing — Honest Gaps

No system is complete, and pretending otherwise would undermine the credibility of everything above. Here is what we are actively working on:

Post-Parallel Merge Conflicts

When /parallel-implement creates three PRs from independent worktrees, they all branch from the same main commit. Merging the first PR is clean. The second often has conflicts from the first. The third is worse.

Right now, this is handled with manual prompting — rebasing branches, resolving conflicts case by case. It works, but it is not encoded into a skill. The next step is /merge-wave — a skill that analyzes file overlap across parallel PRs, determines optimal merge order (most isolated first), rebases remaining branches after each merge, and runs build/lint checks on each rebased branch.

The key design decision: the skill will present conflicts with full context from both tickets, but it will not auto-resolve them. Conflict resolution requires understanding the intent of both changes — that is an engineering judgment call, not a mechanical step.

Security Scanning

The /review skill checks the diff for obvious security issues — exposed secrets, SQL injection vectors, missing auth checks. But it only looks at the diff, not the full codebase, and it only runs when someone invokes it.

What is missing is CI-level automated security scanning: dependency vulnerability scanning (Dependabot), secret detection across git history (TruffleHog), and static analysis for common vulnerability patterns (CodeQL). These should run on every PR automatically, not just when someone remembers to invoke a skill.

We are also building a /security-review skill that does a full codebase audit — checking all API routes for auth coverage, validating Supabase RLS policies, scanning for unsafe HTML rendering, and verifying environment variable exposure. The difference from /review is scope: /review checks the diff, /security-review checks the whole app.

Testing Automation

This is the biggest gap. Most apps in the monorepo do not have test suites. The /implement skill has a "run tests if they exist" step, but when tests do not exist, that step is a no-op.

The next addition is a /test-gen skill that generates tests for changed code using Vitest — unit tests for utilities and server actions, integration tests for API routes using MSW, and component tests using React Testing Library. The goal is not 100% coverage — it is a baseline that prevents regressions and catches obvious bugs.

On the CI side, we are adding a test workflow that only runs tests for packages affected by the PR's changes, using Turborepo's filtered execution. Coverage thresholds will start low and ratchet up as the test suite grows.

Post-Merge Automation

Currently, when a PR merges, nothing happens automatically. The Linear issue stays in "In Review" until manually updated. No changelog is generated. No deployment notification is sent.

This is straightforward to fix with GitHub Actions — extract the Linear issue ID from the PR title, update the issue status to "Done" via the Linear API, and append a changelog entry. Vercel handles deployment on merge, so the remaining gap is just notification and bookkeeping.

Process Discipline

/kickoff and /plan-sprint exist and work well, but they are not used as consistently as they should be. Starting every session with /kickoff and running /plan-sprint weekly would make the entire pipeline more systematic. The tooling is there — the habit is not. The best tools in the world do not help if you do not use them.

The Bottleneck Shifted

A year ago, the bottleneck was implementation speed. How fast could the code be written, the API wired up, the UI built? That bottleneck is gone. With parallel worktrees and autonomous implementation, the raw throughput of code production is no longer the constraint.

The bottleneck now is upstream and downstream. Upstream: are we specifying the right thing? Are the tickets precise enough? Is the architecture sound before the AI starts building? Downstream: is the review process catching real issues? Are the quality gates rigorous enough? Is the merged code actually correct?

These are the same questions every engineering team faces. AI did not change the questions. It shifted where the constraint lives — from "can we build it fast enough" to "can we specify, review, and verify it well enough."

That is the distinction. Vibe coding removes the process and lets AI fill the void. AI-augmented engineering keeps the process and lets AI accelerate it. One ships fast. The other ships fast and ships right.

The AI is the easy part. The engineering is the hard part. It always was.