Static Site Hosting with S3 and CloudFront: The Decisions That Actually Matter

This website — the one you’re reading right now — runs on S3 and CloudFront. No web servers, no containers, no managed hosting platform. The infrastructure is defined in a single CloudFormation template and deployed with a bash script. It costs less than a dollar a month.

But getting here required navigating a set of decisions that most static hosting tutorials either skip or get wrong. The typical guide starts with enabling S3 static website hosting, making the bucket public, and pointing CloudFront at the website endpoint. That approach works, but it leaves your content publicly accessible through the S3 URL, bypassing CloudFront entirely. The modern approach keeps the bucket fully private and uses Origin Access Control to let CloudFront authenticate to S3 on your behalf. That one decision — whether to make the bucket public — cascades into URL handling, caching behavior, and deployment workflow. Here’s how those pieces fit together.

Keep the Bucket Private

There are two fundamentally different ways to connect CloudFront to S3. The first uses S3’s static website hosting feature, which gives you a website endpoint that handles directory index resolution, custom error pages, and redirects. The catch is that this endpoint requires public read access to the bucket, or at minimum a permissive bucket policy. Even with CloudFront in front, your content is accessible directly through the S3 website URL.

The second approach uses the S3 REST API endpoint with Origin Access Control. The bucket stays completely locked down — all four PublicAccessBlock settings set to true — and CloudFront authenticates using SigV4 signed requests. The bucket policy grants s3:GetObject only to the cloudfront.amazonaws.com service principal, scoped to your specific distribution:

WebsiteBucket:
  Type: AWS::S3::Bucket
  Properties:
    BucketName: !Ref DomainName
    PublicAccessBlockConfiguration:
      BlockPublicAcls: true
      BlockPublicPolicy: true
      IgnorePublicAcls: true
      RestrictPublicBuckets: true

CloudFrontOAC:
  Type: AWS::CloudFront::OriginAccessControl
  Properties:
    OriginAccessControlConfig:
      Name: !Sub '${DomainName}-OAC'
      OriginAccessControlOriginType: s3
      SigningBehavior: always
      SigningProtocol: sigv4

WebsiteBucketPolicy:
  Type: AWS::S3::BucketPolicy
  Properties:
    Bucket: !Ref WebsiteBucket
    PolicyDocument:
      Statement:
        - Effect: Allow
          Principal:
            Service: cloudfront.amazonaws.com
          Action: s3:GetObject
          Resource: !Sub '${WebsiteBucket.Arn}/*'
          Condition:
            StringEquals:
              AWS:SourceArn: !Sub 'arn:aws:cloudfront::${AWS::AccountId}:distribution/${WebsiteDistribution}'

OAC replaced the legacy Origin Access Identity (OAI) and is what AWS now recommends. It works across all S3 regions, supports SSE-KMS encrypted objects, and uses standard SigV4 authentication rather than a special CloudFront-specific identity. The Condition block in the bucket policy is the part that matters most — it ensures only your specific CloudFront distribution can read from the bucket, not just any CloudFront distribution in any AWS account.

The URL Rewriting Problem Nobody Warns You About

Choosing the REST API endpoint over the website endpoint buys you proper security, but it introduces a problem that will trip you up immediately: S3’s REST API does not resolve directory paths to index files.

When you use the S3 website endpoint, a request for /blog/ automatically serves /blog/index.html. The REST API endpoint treats /blog/ as a literal object key lookup — there’s no object with that key, so you get a 403. CloudFront has a DefaultRootObject setting, but it only applies to the root path (/). A request for /blog/ hits S3 unmodified.

The fix is a CloudFront Function attached to the viewer-request event that rewrites URIs before they reach S3:

function handler(event) {
  var request = event.request;
  var uri = request.uri;
  if (uri.endsWith('/')) {
    request.uri += 'index.html';
  } else if (!uri.includes('.')) {
    request.uri += '/index.html';
  }
  return request;
}

This handles two cases: paths with a trailing slash (/blog/ becomes /blog/index.html) and bare paths without a file extension (/blog becomes /blog/index.html). The !uri.includes('.') check is a pragmatic heuristic — it assumes that if the URI contains a dot, it’s a request for an actual file like /style.css or /image.png, so it should pass through untouched. For a static site generated by Hugo or any similar tool, this works reliably.

CloudFront Functions run at the edge, execute in under a millisecond, and cost a fraction of a cent per million invocations. Before they existed, you needed Lambda@Edge for this — a heavier solution that runs in a full Node.js runtime with cold starts and higher per-request costs. For simple URL rewriting, CloudFront Functions are the right tool.

Caching for Fingerprinted Assets

Hugo — like most static site generators — fingerprints CSS and JavaScript files by appending a content hash to the filename. When the file content changes, the hash changes, generating a new URL. This makes fingerprinted assets safe for aggressive caching: the URL is effectively a unique identifier for that specific version of the file.

The caching strategy follows directly from this property. Fingerprinted assets get a one-year cache with the immutable directive. HTML files get a short cache with must-revalidate, because HTML is what references those fingerprinted URLs — when you deploy new content, browsers need to pick up the new HTML to discover any new asset URLs.

The deploy script implements this with a two-phase S3 sync:

# Phase 1: Sync everything with long cache
aws s3 sync public/ s3://$BUCKET_NAME/ \
    --delete \
    --cache-control "public, max-age=31536000, immutable"

# Phase 2: Override HTML files with short cache
aws s3 cp s3://$BUCKET_NAME/ s3://$BUCKET_NAME/ \
    --recursive \
    --exclude "*" \
    --include "*.html" \
    --content-type "text/html; charset=utf-8" \
    --cache-control "public, max-age=3600, must-revalidate"

The first command uploads everything with a one-year max-age. The second command copies HTML files back to themselves within S3, overriding only the Cache-Control header to a one-hour max-age. It’s a bit of a hack — S3 doesn’t support setting different metadata per file type in a single sync — but it works cleanly and the cost is negligible.

On the CloudFront side, separate cache behaviors reinforce this. The css/* and js/* path patterns get a 7-day DefaultTTL, while the default behavior gets a 1-day DefaultTTL. CloudFront respects the origin’s Cache-Control header when it falls within the MinTTL/MaxTTL bounds, so the S3 headers and CloudFront behaviors work together rather than fighting each other.

After every deploy, I run a full cache invalidation (/*). For a personal site, this is the pragmatic choice — the first 1,000 invalidation paths per month are free, and the simplicity of invalidating everything beats the complexity of tracking which files actually changed.

The ACM Certificate Trap and Other Constraints

There’s a constraint that catches almost everyone the first time: the ACM certificate for your custom domain must be in us-east-1, regardless of where your S3 bucket lives. My bucket is in eu-north-1, but the certificate is in us-east-1. CloudFront is a global service that reads certificates only from that region. My CloudFormation template encodes this constraint directly:

CertificateArn:
  Type: String
  Description: ARN of the ACM certificate in us-east-1 region
  AllowedPattern: ^arn:aws:acm:us-east-1:[0-9]{12}:certificate/[a-z0-9-]+$
  ConstraintDescription: Must be a valid ACM certificate ARN in us-east-1

The AllowedPattern validation catches the mistake at deployment time rather than letting it fail silently during distribution creation. Encoding operational knowledge into the template like this saves future-you a debugging session.

Two other decisions worth mentioning: I use PriceClass_100, CloudFront’s cheapest tier, which serves from edge locations in North America and Europe. For a personal site, paying for Asia-Pacific and South America edge locations doesn’t make sense. And I run a separate CloudFront distribution for root domain redirect — lahtinen.org redirects to nikolas.lahtinen.org using a second S3 bucket configured purely for redirect. This redirect bucket uses the S3 website endpoint as a custom origin, which is the one place where the website endpoint is genuinely the right choice: you need S3’s redirect capability, not file serving.

Build, Sync, Invalidate

The full deployment is a single script that runs four steps: update the CloudFormation stack if the template changed, build the site with Hugo, sync files to S3, and invalidate the CloudFront cache.

# Build
hugo --minify

# Sync to S3 (--delete removes files no longer in the build output)
aws s3 sync public/ s3://$BUCKET_NAME/ \
    --delete \
    --cache-control "public, max-age=31536000, immutable"

# Override HTML cache headers
aws s3 cp s3://$BUCKET_NAME/ s3://$BUCKET_NAME/ \
    --recursive --exclude "*" --include "*.html" \
    --content-type "text/html; charset=utf-8" \
    --cache-control "public, max-age=3600, must-revalidate"

# Invalidate CloudFront cache
aws cloudfront create-invalidation \
    --distribution-id "$DISTRIBUTION_ID" \
    --paths "/*"

The --delete flag on s3 sync is important — without it, pages you remove from the site would continue to be served from S3 until you manually cleaned them up. With --delete, the S3 bucket always mirrors the build output exactly.

Because the CloudFormation template lives in the same repository as the site content, infrastructure changes and content updates flow through the same deployment path. Change a CloudFormation resource, add a blog post, and run the same script — the stack updates first, then the content deploys. This could trivially be wrapped in a GitHub Actions workflow, but for a personal site, running a script from my terminal is simpler and gives me direct feedback.

Static site hosting on AWS is more setup than dropping a repository into Netlify or Vercel. But the result is infrastructure defined in files I control, behavior I can trace through CloudFormation resources and S3 headers, and a deployment pipeline with no platform-specific abstractions. For a site that needs to exist for years with minimal maintenance, that transparency is worth the initial afternoon of configuration.