Thoughts on the S3 Service Degradation

2 min read

As I write this, the internet’s blowing up because of the Amazon S3 service degradation. Internally and externally I have people reaching out and asking “what does this actually mean? Are we affected? It seems like there’s a problem, we can’t go down too!” Other variations of this panic are flooding across twitter and many MSPs support intake funnels, as well as account manager/executive inboxes, phones, and the like.

Here’s the good news, my vast and varied Chicken Littles: the sky is not falling. Amazon’s AWS is one of the most robust service offerings in the cloud provider space, and their S3 service is an important part of that, but not the only load-bearing member.

Let’s start with what it is: S3 is used as a file storage system. Many services rely on it for upload storage and many more use it as an inexpensive way to serve static content to the public. Some services use it as a cheap extension to an EBS volume using s3fs, which I’ll touch more on in a bit.

However while the problems with S3 may impact a variety of people, it’s not going to impact everyone, or even everyone using AWS. In fact, because of the intelligence of amazon and the general resiliency of S3, once it is fully available again, you should have no problem doing anything you were able to before, and retrieve any files or content that was stored there.

What may be failing right now: image, video, and other static content. Backups and snapshots that rely on S3 as a backing store. CloudWatch logs that pump in to S3, and of course, things that rely on s3fs.

Here’s where it gets technical!

So why is that last one a bit more troublesome? Well, systems that actively rely on an s3 bucket as an extension to their local file system could be making blocking reads or writes to a volume, which may cause other actions on the local filesystem to fail.

Pretend like you’re a 6 year old, and are told to put things away in a specific order, and a train needs to go in to a toy chest - but the toy chest is closed and locked. Well, most 6 year olds will be stuck, as will filesystems, that can’t do what they’re told. They simply don’t have the ability to skip and move to the next thing. In the case of filesystems, they actually shouldn’t - if they can’t store a file (or bits of a file, or just bits) they should error out. It’s up to the application to gracefully handle the situation. Many applications don’t, because they rely on the assumption that a filesystem will always be available if the system is up.

Luckily, many people who have architected applications from the ground up to use s3fs or s3 buckets are smart enough to realize that s3 could (but shouldn’t) become unavailable. People that are using it as a hacky way to reduce costs or work around certain EBS limitations may not be so lucky, and you may end up with ec2 instances going offline or applications simply becoming unavailable.

In the end though, people with proper safeguards (and very likely even those without) will probably forget this has happened by the time the weekend rolls around, because the impact itself is limited to S3’s degradation. When people start seeing their movies, images, and websites pop back up again without any problems, the worst will be over and most of the world will go back to what’s happening with celebrities according to TMZ.

To conclude, do as the H2G2 recommends: DON’T PANIC.

Image of Stephen Sadowski

Stephen Sadowski

Leader focusing on quality, delivery, technical debt management, and leadership education about DevOps and SRE practices