Adding Control-Cache with S3FS

Caching is a big deal at SpanishDict. We cache our rendered jade views, all of our database lookups, all of our client-side code, and every single image that we show to our users. We serve all of these assets through CDNs to further improve performance on the site. Then we encourage browsers to cache these assets for as long as possible.

Except, that is, for the SpanishDict blog. Well, until this week!

The SpanishDict blog

Our blog is very important at SpanishDict. We have a fanstastic team that develops original content for it. It's featured on our homepage. It's the subject of a previous blog post, where we describe the architecture we used to reach zero-downtime deployments.

Here's the short version:

  • Lives on Elastic Beanstalk
  • Stores posts in an RDS instance
  • Thinks it saves images locally on the file system
  • Which is actually an S3 bucket, mounted with s3fs

Following a microservice architecture, the blog is a completely different world from the rest of SpanishDict. When the homepage wants to feature four articles, it makes a request to the blog's RSS feed and (the result of which is cached of course) which tells it what post titles to render and what images to fetch to display along with them.

Control-Cache

When you request images from a Ghost instance, it nicely sets a header on all of them, which tells the browser to cache those images for as long as possible. However, to lessen the load on the blog and increase speed, we wanted the homepage to instead request the images from Cloudfront, an AWS offering which acts as a CDN on top of S3.

If you request an asset from Cloudfront, the response it sends back will only have headers that are specified on that asset in S3. That meant that to get our Control-Cache header, it had to be sent along with the file when we first uploaded it to S3. Hopefully s3fs has a way to do that...

ahbe_conf

It does! When you mount a directory with s3fs, you can pass it a flag like -o ahbe_conf=file.conf. In this file you can configure all additional headers you would like to send with your uploads. More details on how to do that here.

In Elastic Beanstalk terms, it meant adding to our YAML files in .ebextensions/

"/home/ec2-user/s3fs-fuse-1.77/caching_ahbe.conf":
    owner: root
    group: root
    content: |
        # Send custom headers to s3 for caching these files as long as possible
        Cache-Control public, max-age=31536000

And changing our mounting command to:

/usr/bin/s3fs $S3_BUCKET /var/local/images -o allow_other -o use_cache=/tmp -o nonempty -o ahbe_conf=/home/ec2-user/s3fs-fuse-1.77/caching_ahbe.conf

Now all our new images will have a Control-Cache header when served though Cloudfront!

What about our old images?

You got me, old images will still have all the same headers you originally uploaded them with. Luckily, updating them is easy, using s3cmd.