Self-Hosted GitHub Actions Runners That Scale to Zero

GitHub’s attempt to charge $0.002 per minute for self-hosted runner usage in March 2026 — and the swift community backlash that shelved it — brought a useful question into sharp focus: what exactly are you paying for when you run CI/CD, and who controls that cost? The fee was modest, but the principle mattered. If your runner infrastructure runs on your own hardware, paying a platform tax for job orchestration changes the economics in ways that compound at scale. I’ve been running a custom runner system on AWS for a while now, and the architecture has held up well enough that I think the patterns are worth sharing. The system scales to zero when idle, provisions ephemeral instances per job, and costs a fraction of GitHub-hosted runners during active use.

Why Self-Host at All

There are three reasons I moved away from GitHub-hosted runners, and cost is only one of them.

Cost is the most measurable. GitHub-hosted Linux runners cost $0.008 per minute. An m5.xlarge spot instance on AWS costs roughly $0.04 per hour — about $0.0007 per minute. That’s more than a 10x difference in raw compute cost. When you’re running hundreds of CI jobs per day across dozens of repositories, that gap adds up to real money. And when no jobs are running, the self-hosted system costs nothing because there are no instances running.

Performance is the reason engineers actually care about. GitHub-hosted runners use shared hardware with cold caches. Build tools, package managers, and Docker layers start from scratch every time. On a self-hosted runner with a pre-built AMI that includes your toolchain, the runner is ready in under two minutes and your build caches are warm from the image. Teams I’ve worked with report builds running 2-5x faster after switching, and I’ve seen similar numbers.

Control is the reason that’s hardest to quantify but matters most in regulated environments. Self-hosted runners can live inside your VPC, access private resources without public exposure, run custom tooling that’s pre-installed on the AMI, and meet compliance requirements that GitHub’s shared infrastructure can’t satisfy.

The trade-off is real: you’re now responsible for infrastructure. But if you’re already running workloads on AWS and have the operational muscle for Lambda functions and EC2, the marginal overhead is manageable.

The Architecture: Webhook to SQS to Lambda to EC2

The system is event-driven from end to end. No polling, no idle workers, no wasted compute.

GitHub webhook (workflow_job)
  → API Gateway + Lambda (webhook handler)
    → SQS queue (job buffer)
      → Lambda (runner provisioner)
        → EC2 instance (ephemeral runner)
          → Self-terminates after job completes

The webhook handler is a lightweight Lambda function behind API Gateway. When GitHub sends a workflow_job event with a queued action, the handler validates the HMAC-SHA256 signature, checks that the job’s labels match the runner’s configured labels, and drops the job metadata onto an SQS queue. Everything else — completed jobs, in-progress updates, label mismatches — gets acknowledged with a 200 and ignored.

import { createHmac, timingSafeEqual } from "node:crypto";

export async function handler(event) {
  const signature = event.headers["x-hub-signature-256"];
  const payload = event.body;

  // Validate webhook authenticity
  const expected = "sha256=" +
    createHmac("sha256", WEBHOOK_SECRET).update(payload).digest("hex");

  if (!timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
    return { statusCode: 401, body: "Invalid signature" };
  }

  const body = JSON.parse(payload);

  // Only act on newly queued jobs
  if (body.action !== "queued") {
    return { statusCode: 200, body: "Ignored" };
  }

  // Enqueue for provisioning
  await sqs.send(new SendMessageCommand({
    QueueUrl: JOB_QUEUE_URL,
    MessageBody: JSON.stringify({
      jobId: body.workflow_job.id,
      repository: body.repository.full_name,
      labels: body.workflow_job.labels,
    }),
  }));

  return { statusCode: 200, body: "Queued" };
}

The SQS queue is a deliberate design choice. I could have the webhook handler launch EC2 instances directly, and early versions did exactly that. The problem is reliability. If the provisioner Lambda hits a GitHub API rate limit, or if you’re at your max runner count and need to wait, or if the EC2 API returns a transient error — the job is lost. With SQS, failed provisioning attempts automatically retry with backoff. Jobs that can’t be fulfilled after five retries move to a dead-letter queue for investigation. The queue adds maybe 100 milliseconds of latency, but it turns a fragile synchronous call into a durable pipeline.

The runner provisioner Lambda consumes messages from the queue, checks whether the current runner count is below the configured maximum, fetches a fresh registration token from the GitHub API, and launches an EC2 instance with user data that configures and starts the runner.

Ephemeral Runners: One Job, One Instance

Every runner instance handles exactly one job and then terminates. This is GitHub’s --ephemeral flag in action, and it’s the single most important design decision in the system.

The EC2 instance boots with a user data script that configures a pre-installed runner binary, registers it with the GitHub organization using a single-use token, and waits for its assigned job. When the job completes — or if anything goes wrong — an EXIT trap ensures the instance terminates itself.

#!/bin/bash
set -euo pipefail

RUNNER_DIR="/opt/github-runner"
INSTANCE_ID=$(ec2-metadata -i | cut -d' ' -f2)

# Self-terminate on any exit
cleanup() {
  aws ec2 terminate-instances --instance-ids "$INSTANCE_ID"
}
trap cleanup EXIT

# Configure runner with ephemeral flag
cd "$RUNNER_DIR"
./config.sh \
  --url "https://github.com/${GITHUB_ORG}" \
  --token "${RUNNER_TOKEN}" \
  --labels "${RUNNER_LABELS}" \
  --ephemeral \
  --unattended

# Run exactly one job, then exit (triggers cleanup)
./run.sh

The benefits are straightforward. Every job gets a clean environment — no leftover files, no stale Docker layers, no credentials from a previous run. There’s no security risk from a long-lived runner that accumulates state. And capacity management is simple: the number of running instances equals the number of active jobs.

The trade-off is startup latency. Booting an EC2 instance, configuring the runner, and registering with GitHub takes 60 to 90 seconds. I mitigate this with a pre-built AMI that has the runner binary, common build tools, and language runtimes already installed. The user data script only needs to configure and start the runner, not install it. For most CI workloads where jobs run for 5 to 15 minutes, a 90-second startup overhead is acceptable.

Spot Instances with On-Demand Fallback

Ephemeral runners and spot instances are a natural pairing. AWS selects spot instances for interruption partly based on how long they’ve been running — short-lived instances are less likely to be reclaimed. A runner that exists for 10 minutes and self-terminates is about as safe as spot gets.

The provisioner attempts a spot launch first. If AWS returns a capacity error — InsufficientInstanceCapacity, SpotMaxPriceTooLow, or MaxSpotInstanceCountExceeded — it retries with an on-demand instance. No job is ever lost because spot capacity was unavailable.

const SPOT_CAPACITY_ERRORS = [
  "InsufficientInstanceCapacity",
  "SpotMaxPriceTooLow",
  "MaxSpotInstanceCountExceeded",
];

async function launchRunner(config) {
  if (config.useSpotInstances) {
    try {
      const instance = await ec2.send(new RunInstancesCommand({
        ...config.baseParams,
        InstanceMarketOptions: {
          MarketType: "spot",
          SpotOptions: { SpotInstanceType: "one-time" },
        },
        TagSpecifications: [{
          ResourceType: "instance",
          Tags: [...config.baseTags, { Key: "MarketType", Value: "spot" }],
        }],
      }));
      return instance;
    } catch (err) {
      if (!SPOT_CAPACITY_ERRORS.includes(err.Code)) throw err;
      console.log("Spot unavailable, falling back to on-demand");
    }
  }

  return ec2.send(new RunInstancesCommand({
    ...config.baseParams,
    TagSpecifications: [{
      ResourceType: "instance",
      Tags: [...config.baseTags, { Key: "MarketType", Value: "on-demand" }],
    }],
  }));
}

Tagging instances with MarketType makes cost attribution straightforward. You can see exactly how much you’re spending on spot versus on-demand, and track it per repository or per team using additional tags. In practice, spot availability for m5.xlarge instances in multiple availability zones is high enough that the fallback rarely triggers — but when it does, the job still runs without any developer intervention.

Some organizations disable spot entirely and use on-demand for every run. That’s a valid choice if your CI jobs include operations that are dangerous to interrupt, like terraform apply or database migrations. The architecture supports both modes with a single configuration toggle.

What I’d Evaluate Today

The landscape for self-hosted runners has improved significantly in 2026, and if I were starting fresh I’d seriously evaluate a few alternatives before building custom.

GitHub’s Runner Scale Set Client, released in public preview in February 2026, is a Go-based module that handles autoscaling without requiring Kubernetes. You define how runners are created and destroyed; the client manages the GitHub API interactions. It’s open source, and for teams that want autoscaling without maintaining their own webhook pipeline, it eliminates a meaningful chunk of custom code.

AWS CodeBuild now offers managed GitHub Actions runners with ephemeral environments and strong security boundaries. If your primary concern is reducing operational overhead and you don’t need custom AMIs or VPC placement, CodeBuild is the lowest-effort path.

The philips-labs/terraform-aws-github-runner module implements a very similar architecture to what I’ve described — webhook-triggered, Lambda-orchestrated, ephemeral EC2 instances with spot support. If you prefer Terraform over SAM, this is battle-tested and well-maintained.

Actions Runner Controller on Kubernetes is the reference implementation of GitHub’s scale set APIs. If you’re already running Kubernetes, ARC is the obvious choice. If you’re not, adopting Kubernetes specifically for CI runners is a heavy lift that I wouldn’t recommend.

I built custom because I needed specific AMI configurations, VPC placement for accessing internal services during builds, granular cost tagging by team and service, and tight integration with existing SAM-based infrastructure. Those requirements justified the investment. For teams without those constraints, an off-the-shelf solution will get you to the same place with less maintenance burden.

The Infrastructure You Understand Is the Infrastructure You Control

The cost savings from self-hosted runners are real and measurable. But the deeper value is understanding your CI/CD infrastructure as a system — one with clear inputs, predictable behavior, and operational characteristics you can reason about and optimize. When a build is slow, you can look at the instance type, the AMI contents, and the job configuration rather than filing a support ticket with a vendor.

With GitHub’s pricing trajectory uncertain and the ecosystem of runner solutions maturing quickly, the decision isn’t permanent. The webhook-driven, ephemeral runner pattern on AWS is the right level of complexity for teams that are already invested in the AWS ecosystem — simpler than running Kubernetes, more flexible than fully managed services, and cheaper than GitHub-hosted runners by an order of magnitude. Whether you build it yourself or adopt an existing module, the architectural principles are the same: scale to zero, keep runners ephemeral, and make sure no job is ever lost to a transient failure.