Amazon Web Services makes it very easy to log lots of data. For most of their services you can enable logging just by indicating what bucket to log it to. Some examples include enabling logging of requests to an S3 bucket, enabling Cloudfront access logs, and enabling Cloudtrail logging.
It's easy enough to enable these and move on. However inevitably you run into an issue or you're calculating a usage metric and you need to access these log files and interpret them. Sounds easy enough, right? Normally, you'd have to:
- Navigate the S3 bucket to find the appropriate files. (Quicker said than done.)
- Unzip the files if necessary.
- Search, grep based on position, or manually inspect.
For example, checking the HTTP response codes for some requests through Cloudfront.
It's not impossible, but this process could be better. Especially if it is something you find yourself doing often.
Why don't we just set up a log processing pipeline that streams the logs into a Map/Reduce or Loggly-like solution? Well, in short, we do for some of our logs! However, there are still plenty of times when we just want to access the logs and quickly review a specific timeframe or run a count.
Introducing Spotcheck, a command line utility to quickly download and query log files stored on AWS. It requires you to provide the S3 bucket, prefix, log format, and an optional date. With that, it will know the full name of the log files, download them, unzip them, concatenate them, and convert them into JSON.
$ spotcheck download config/cloudfront.json --date 2014-04-25
$ spotcheck report my-cloudfront-log.json --field sc-status
Contributions and suggestions welcome!