A few years ago, I watched a deployment go out on a Friday afternoon. The code had passed every automated check — linting, unit tests, integration tests, security scans. A developer merged to the main branch, and continuous deployment did exactly what it was designed to do: shipped the change to production within minutes. The change broke search results for a specific category of listings that the test suite didn’t cover. The issue wasn’t caught for two hours because the person who merged it had already context-switched to something else. The pipeline worked perfectly. The design was the problem.
Most deployment literature assumes continuous deployment is the end-state and anything slower is technical debt. I’ve come to believe the opposite for certain systems. For high-traffic, data-serving applications where a bad deploy is immediately visible to thousands of users, deliberate friction between environments is an engineering feature — not an organizational failure.
The Continuous Deployment Assumption
Continuous deployment has become the default aspiration of modern engineering teams, and the DORA metrics reinforce it. Deployment frequency and lead time for changes are headline indicators. The implicit message: if you’re not deploying to production on every merge, you’re behind.
This is genuinely correct for many systems. SaaS products with mature feature flag infrastructure, internal tools where downtime is an inconvenience rather than a revenue event, microservices with strong contract testing — continuous deployment works well in these contexts. When rollback is cheap, blast radius is small, and your test suite covers the surface area that matters, speed is the right optimization target.
But continuous deployment assumes things that aren’t always true. It assumes your test suite is comprehensive enough that passing tests means production-ready. It assumes rollback is fast and free of side effects. It assumes someone is watching when the deployment lands. And it assumes the cost of a bad deploy is low enough to absorb as part of normal operations.
For systems that serve high-traffic content — where broken rendering, bad data, or degraded search results are immediately visible to thousands of users — the cost of a bad deploy is not “roll back in five minutes.” It’s user-visible degradation, potential SEO impact, and support tickets that take longer to resolve than the deployment took to ship. These costs don’t show up in DORA metrics, but they’re real. The “move fast and break things” ethos was coined at a company with thousands of engineers and a sophisticated experimentation platform. Transplanting it to a ten-person team serving millions of page views is cargo culting, not engineering.
What a Multi-Stage Pipeline Actually Looks Like
The pipeline I run uses four environments with manual promotion gates between them. The design is intentionally slower than continuous deployment, and the slowness is the point.
Feature branches get automated validation on every push. GitHub Actions runs linting, unit tests, integration tests, and security audits. This feedback is fast and fully automated — there’s no friction at this stage because rapid iteration on a feature branch is pure upside.
# .github/workflows/pr-validation.yml
name: PR Validation
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: composer install --no-interaction
- name: Lint
run: vendor/bin/php-cs-fixer fix --dry-run --diff
- name: Static analysis
run: vendor/bin/psalm --no-cache
- name: Unit tests
run: vendor/bin/phpunit --testsuite unit
- name: Integration tests
run: vendor/bin/phpunit --testsuite integration
When a PR merges to the development branch, a CI job automatically builds and deploys to the test environment. This is where the team and QA see changes in a real environment for the first time.
Promotion from test to staging is a manual action — the first gate. Someone has to decide that the batch of changes in the test environment is ready for a more production-like environment. Staging runs against production-like data and configuration, with the same infrastructure topology as production.
Promotion from staging to production is the second gate. In Jenkins, this is a parameterized job with an explicit confirmation step:
// Jenkinsfile — production deployment
pipeline {
agent any
parameters {
string(name: 'DEPLOY_VERSION', description: 'Version tag to deploy')
booleanParam(name: 'STAGING_VERIFIED',
defaultValue: false,
description: 'I have verified this version in staging')
}
stages {
stage('Gate Check') {
steps {
script {
if (!params.STAGING_VERIFIED) {
error('Deployment blocked: staging verification not confirmed')
}
}
}
}
stage('Deploy') {
steps {
sh "./deploy.sh production ${params.DEPLOY_VERSION}"
}
}
stage('Smoke Test') {
steps {
sh './smoke-test.sh production'
}
}
}
}
The STAGING_VERIFIED boolean is crude, but it works. It forces the deployer to attest that they actually looked at the changes in staging before shipping to production. That conscious decision — opening Jenkins, selecting the version, confirming the checkbox, clicking deploy — changes behavior in ways that “merge to main” doesn’t.
Where the Friction Creates Value
The manual gates provide three types of value that automated checks can’t replicate.
Batch review. When changes accumulate in the test environment before promotion to staging, someone reviews them as a group. This surfaces interaction effects that per-PR review misses. Two individually correct changes can conflict in production: a caching adjustment and a data model migration, a UI layout change and a modified API response shape. The promotion decision is the moment where someone asks “do all of these changes work together in a real environment?” — a question that unit tests can’t answer.
Temporal buffer. Time between merge and production is not wasted time — it’s observation time. Some issues take time to manifest: a slow memory leak that only becomes visible after 30 minutes of traffic, a database query that performs fine on test data but degrades under production-scale load, a race condition in background job processing that surfaces once every few hundred requests. A multi-stage pipeline gives those problems time to appear in environments where they’re observable but not customer-impacting.
Cognitive forcing function. When deploying to production requires deliberate action — not just merging a PR — the deployer reviews what they’re shipping. They know what’s in the batch. They’ve seen it in staging. They’re making a conscious decision that this set of changes is ready. When deployment is a side effect of merging, the deployer might not even know what else merged to main that day.
The cost is real: a change that could reach production in ten minutes with continuous deployment might take a few hours with manual gates. For the systems I run — high-traffic, content-serving, revenue-generating — that trade-off is correct. The cost of a bad production deploy exceeds the cost of a few hours of deployment latency by orders of magnitude.
Automating the Right Parts
This is not an argument against automation. The friction is strategic and targeted — it exists only at the promotion boundaries between environments. Everything else should be automated and fast.
PR validation: automated. Build and deploy to the test environment: automated. Smoke tests after each deployment: automated.
#!/bin/bash
# smoke-test.sh — verify deployment health
set -euo pipefail
BASE_URL=$1
echo "Running smoke tests against $BASE_URL"
# Critical endpoints must respond
curl -sf "$BASE_URL/api/health" | jq -e '.status == "ok"'
curl -sf "$BASE_URL/api/listings?limit=1" | jq -e '.results | length > 0'
# Response time within acceptable bounds
RESPONSE_TIME=$(curl -sf -o /dev/null -w '%{time_total}' "$BASE_URL/api/listings")
if (( $(echo "$RESPONSE_TIME > 2.0" | bc -l) )); then
echo "FAIL: Response time ${RESPONSE_TIME}s exceeds 2s threshold"
exit 1
fi
echo "All smoke tests passed"
If smoke tests fail in the test environment, staging promotion is blocked automatically. The manual gate is not a replacement for automated checks — it’s an additional layer on top of them. Automation answers “is this correct?” Manual gates answer “should this ship now?” These are different questions, and conflating them is where pipelines go wrong.
When This Is Wrong
Manual promotion gates are the wrong choice in several well-defined contexts.
If your team has strong feature flag infrastructure where deployment and release are decoupled, the safety comes from gradual rollout — not from pre-deployment verification. Ship dark, enable incrementally, observe. That model is better if you’ve invested in the tooling to support it.
If speed of iteration is your primary competitive advantage — early-stage products, internal tools, experimental features — the cost of a bad deploy is low and the cost of slow iteration is high. Optimize for speed.
If the gates become rubber stamps — the deployer clicks “staging verified” without actually checking — the system provides false confidence, which is worse than no gate at all. The process only works if the team values it and treats the gates as meaningful decision points.
If your team is large enough that manual coordination becomes a serialization bottleneck, you need automated safety mechanisms — canary deployments, progressive rollouts, feature flags — that provide safety without blocking on a human. Manual gates work for small-to-medium teams where the person deploying has context on what’s being shipped. At scale, that assumption breaks down.
I want to be explicit about the scope: this pattern works for small-to-medium teams running high-stakes, content-serving systems where the cost of a bad production deploy is high and the team size makes manual verification tractable. It is not a universal prescription.
The Speed That Matters
The metric I care about is not “how fast can code reach production.” It’s “how reliably does production serve users.” A pipeline that ships changes in five minutes but causes a user-visible degradation every two weeks is slower in aggregate than one that ships in four hours and causes a degradation once a quarter. Reliability compounds just like deployment speed does — the difference is that reliability compounds in your favor.
The deployment pipeline is not just a technical artifact. It encodes the team’s values about risk, quality, and responsibility. Choosing to add friction is a statement that reliability matters more than velocity in this specific context. That’s an engineering judgment, not a process failure.
My pipeline has been running for years. Changes reach production the same day they merge, usually within a few hours. The manual gates add minutes of human time per deployment — not days, not even hours. And the team ships with confidence, because every production deployment is a deliberate act, not a side effect of merging a pull request.