Vibe coding had its moment. You open a chat, describe what you want in plain English, and an AI agent writes the code. For prototypes and weekend projects, it’s genuinely fun. For production systems, it’s a trap.
The numbers bear this out. A CodeRabbit analysis of 470 GitHub pull requests found that AI-co-authored code contained 1.7x more major issues than human-written code, with security vulnerabilities appearing at 2.74x the rate. And that’s from developers who were at least reviewing the output. The vibe coding ethos — “just trust it and ship” — makes those numbers worse, not better.
I’ve spent the last several months moving from ad-hoc agent prompting to what’s now being called spec-driven development (SDD). The core idea is simple: instead of describing what you want conversationally and hoping the agent infers the rest, you write a specification first, then let agents implement against it. The specification becomes the source of truth. Code becomes a generated artifact.
This isn’t a new concept — it’s how software was built before IDEs made it easy to skip the design phase. What’s new is that AI agents make it practical to maintain the discipline of specification writing without paying the traditional cost of manually translating specs into code.
Why Ad-Hoc Prompting Breaks Down
The fundamental problem with conversational prompting is context loss. When I type “add a user settings page with email preferences,” the agent doesn’t know my authentication model, my state management approach, whether I use server-side or client-side rendering for this type of page, or what “email preferences” means in my domain. It’ll produce something that compiles and looks reasonable, but the architectural decisions it makes will be locally sensible and globally wrong.
I’ve hit this wall repeatedly. An agent adds a React component that manages its own state when the rest of the app uses a centralized store. It introduces a new API endpoint that duplicates an existing one because it never read the route definitions. It picks a date library the project doesn’t use. Each fix is small, but the cumulative cost of correcting these decisions — what I’ve previously called the correction tax — can erase the productivity gain entirely.
The deeper issue is that prompts are ephemeral. Once the agent finishes a task, the reasoning behind it is lost. If another agent (or the same agent in a new session) touches the same code later, it has no record of why specific decisions were made. This produces architectural drift at a pace that manual coding never could.
Specifications as Persistent Context
A specification solves these problems by making decisions explicit and durable. Before any agent writes code, I produce a short document that captures what I’m building, how it fits into the existing system, and what constraints apply. This isn’t a formal requirements document — it’s a focused artifact that gives the agent enough context to make consistent decisions.
Here’s a real example. I needed to add webhook delivery tracking to a notification service. Instead of prompting an agent directly, I wrote this:
# Webhook Delivery Tracking
## Context
The notification service (services/notifications/) currently fires webhooks
via WebhookDispatcher but has no visibility into delivery success/failure.
Customers report missing webhooks with no way for us to investigate.
## Requirements
- Record every webhook delivery attempt with status, response code, and latency
- Store delivery records in the existing PostgreSQL database (not a new store)
- Expose delivery history through the existing REST API under /webhooks/{id}/deliveries
- Retry failed deliveries up to 3 times with exponential backoff
- Follow the existing service patterns: repository → service → controller
## Constraints
- No new dependencies — use the existing HTTP client (got) and ORM (Prisma)
- Delivery records must be prunable — add a retention policy field
- Must not block the webhook dispatch path — tracking writes are async
## Out of Scope
- Dashboard UI (separate task)
- Alerting on repeated failures (separate task)
- Changes to the webhook registration flow
This document took me fifteen minutes to write. But it eliminates an entire class of agent mistakes: wrong database choice, wrong architectural layer, unnecessary dependencies, scope creep into the UI. The agent implements against a clear target instead of inferring one from a vague prompt.
The Workflow in Practice
My current workflow has four phases, and they map onto how I’ve always done engineering — just with different tools at each step.
Phase 1: Research. I use the agent to explore the existing codebase before writing the spec. This is where agents are underutilized. I’ll ask it to map out the current module structure, find similar patterns in the codebase, and identify interfaces I need to conform to. This exploration feeds the specification.
Phase 2: Specify. I write the spec myself. This is the phase where engineering judgment matters most — scoping the work, choosing the approach, identifying constraints. The agent can help draft the spec, but I make the decisions. This is the “measure twice” part.
Phase 3: Implement. The agent works against the spec. For larger features, I break the spec into sequential tasks, each building on the previous one. The agent implements each task, runs the tests, and I review before moving to the next.
# Phase 1: Research
claude "Read services/notifications/ and map out the current architecture.
How are webhooks dispatched? What patterns do existing features follow?"
# Phase 3: Implement against the spec
claude "Implement the webhook delivery tracking feature as specified in
docs/specs/webhook-tracking.md. Start with the database schema
and repository layer. Run existing tests after each change to
verify nothing breaks."
Phase 4: Validate. I review the implementation against the spec, not against my mental model of what the code should look like. This is a subtle but important distinction. When I review ad-hoc agent output, I’m simultaneously figuring out what it was trying to do and whether it did it well. When I review against a spec, I already know the intent — I’m only evaluating execution.
Parallel Agents with Shared Specs
The real productivity unlock comes when you combine specs with parallel execution. A single specification can be decomposed into independent work streams that different agents handle simultaneously, each in an isolated git worktree.
I used this recently for a feature that touched three layers: database schema, API endpoints, and background job processing. The spec defined clear interfaces between the layers, so each agent could work independently:
## Interface Contract
DeliveryRecord schema:
- webhookId: string (FK to webhooks table)
- attemptNumber: number (1-3)
- status: 'pending' | 'success' | 'failed'
- responseCode: number | null
- latencyMs: number | null
- createdAt: timestamp
Repository interface:
- create(record: DeliveryRecord): Promise<DeliveryRecord>
- findByWebhookId(id: string, limit?: number): Promise<DeliveryRecord[]>
- deleteOlderThan(date: Date): Promise<number>
Three agents worked in parallel: one on the database migration and repository, one on the API controller and route registration, one on the retry logic and background worker. Because the spec defined the interfaces, there were no conflicts when merging. Each agent knew exactly what shape the data took and what functions would be available.
This is impossible with ad-hoc prompting. Without a shared spec, each agent invents its own interfaces, naming conventions, and data shapes. The merge step becomes a rewrite.
Guardrails That Actually Work
Specs handle the “what to build” problem. Guardrails handle the “what not to do” problem. I’ve found that effective guardrails operate at three levels:
Tool-level restrictions control what the agent can touch. For implementation tasks, I restrict file system writes to the relevant directories. For review tasks, I remove write permissions entirely. This isn’t about trust — it’s about limiting blast radius when the agent misinterprets the scope.
Test-driven validation catches functional regressions in real time. The agent runs the test suite after every change. If tests fail, it fixes the issue before moving on. This creates a tight feedback loop that converges on correct implementations faster than post-hoc review.
Spec compliance checks are the layer most teams skip. After the agent finishes, I explicitly ask it to compare its implementation against the original specification and flag any deviations. This catches scope creep, missed requirements, and constraint violations that tests alone won’t surface.
claude "Compare the implementation in services/notifications/webhooks/
against docs/specs/webhook-tracking.md. Flag any requirements
that weren't implemented, any constraints that were violated,
and any functionality that was added beyond what the spec calls for."
That last category — functionality added beyond the spec — is surprisingly common. Agents are eager to be helpful, and they’ll add error handling, logging, or convenience methods that weren’t requested. Sometimes that’s fine. Often it introduces complexity that makes the code harder to maintain. The spec gives you a clear standard to evaluate against.
The Role Shift Is Real
Spec-driven development changes what I do every day. I write less code and more prose. I spend more time in research and design, less in implementation. My review process is faster because I’m checking execution against a known target instead of reverse-engineering intent from code.
This isn’t the “AI replaces developers” narrative. It’s closer to how a senior architect works with a development team: define the approach, set the constraints, delegate the implementation, review the results. The difference is that the “team” can spin up in seconds, works in parallel without coordination overhead, and doesn’t need onboarding.
The engineers who’ll thrive with these tools are the ones who can write clear specifications — which means understanding the problem domain, the existing system, and the trade-offs well enough to make decisions before code exists. That’s always been the hard part of software engineering. Now it’s the only part that isn’t automatable.