S3 and Glacier are simple cloud based storage solutions that provide a secure and cost-effective way to back up your images.


Last week, I plugged one of my hard drives into my iMac to take a routine back up of my image library. It's not been long since I've moved house and this was the first time this drive had been plugged in for a few weeks. After the initial whirring of the disk starting up, I could hear a persistent clicking noise that didn't sound right. Sure enough, when I went to look at the drive on the computer, it wasn't showing up.

Initially my heart rate began to increase but I quickly reminded myself that I have another 2 copies of all the files that were on that disk so there was no need to panic. However, the thought of having to fork out for another hard drive was a bit annoying.

Back in October, I wrote about my backup strategy and mentioned several cloud storage providers but that I wasn't currently using any of them. The fact that hard drives can and do fail without any warning made me revisit my thoughts on image security and I've recently started to use Amazon's S3 storage as a cloud back up solution.

 

What is Amazon S3 and why use it?

Amazon Simple Storage Service (S3) is cloud based storage for almost any kind of data. It's a highly scalable solution meaning that large amounts of data can be stored and served quickly and reliably across the world. Probably 90% of its features and intended use cases are beyond my simple requirements - I just want to back up my images to an off-site location - so you might wonder why I don't use Google Drive, Dropbox or something similar. Well, the truth is I also use Google Drive and pay under £2/month for 100GB of data. In my Google Drive, I store things that change regularly - shopping lists, images for social media and other personal stuff. The idea is that I can easily access it all via my phone or other devices.

However, when it comes to backing up my image library, 2 points stand out.

  1. I need more than 100GB of storage space and ideally I'd like to be charged for what I use and have an upper limit far beyond anything I'll ever fill and therefore never have to worry about running out of space.
  2. I don't need access to the data I back up in real-time. The object is to create an archive in 'cold storage' that simply grows over time as I add new images.

What makes Amazon's storage so appealing is the incredible low cost and durability of storing data in it's long term backup tier, Amazon Glacier. Currently, it costs only $0.0045 per GB / Month - pretty cheap! Storing around 300GB will cost you about $1.35 per month so you can see why it's an attractive solution.

 

What if I need to access my data?

I mentioned that Glacier is designed primarily as an archiving solution and not optimised for data to be transferred in and out regularly, therefore there can be a bit of a wait in order to retrieve everything if the worse happens. How long you need to wait depends on how much data you want to retrieve but we're only talking a few of hours. Also, be aware that it costs to retrieve your data. A flat rate of $0.01 per GB and $0.05 per 1,000 requests applies to standard retrieval requests. The example quoted for a standard retrieval of 500 archives that are 1 GB each is 500GB x $0.01 + 500 x $0.05/1,000 for the requests = $5.025. Again, not a massive amount if you've just lost years of work.

Data retrievals have 'tiers' of which 'standard' is the default but you can also request 'expedited' retrievals that cost more but are available in a matter of minutes. Lastly, 'bulk' retrievals are also available at a lower cost but take longer.

Amazon S3 has a basic system of 'buckets' which are basically just a folder. Just like other file systems, you can have file and folders within each. I have a bucket for my nightly website back ups, my image library, my podcasts and my YouTube videos to keep things separated and organised.

Unlike the Google Drives and Dropboxes of the world, there is no fancy app or program provided by Amazon to upload data to your bucket so you have to use programs from third parties. For Windows, I recommend CloudBerry Explorer for Amazon S3, for macOS, CyberDuck is another easy to use solution and of course, if you're comfortable with the command line, there is a great tool called S3cmd (S3Express for Windows).

One of the cool features about S3, as with Google Drive and the like is that buckets can be public and therefore accessed by anyone, anywhere. This is useful when you have audio or video files on your website that you would normally store on your own web server. Rather than taking up lots of space on your web server, you can put them in an S3 bucket and serve them from there, saving you space on your server and making use of Amazon's reliability and speed when it comes to serving data. It's a bit like embedding a YouTube video on your website - you can view the video there but it lives on YouTube. The difference is, you can have any data stored in your bucket.

 

Bucket lifecycles

Another cool feature of S3 is bucket lifecycles. Lifecycles are like rules for your bucket. Currently, on my master library back up I have a rule set up that moves all the contents from the S3 Standard storage class to the Glacier storage class after a day. This means I have real time access to my newly archived files for a day, in case I do something silly and delete them from all my other drives, before they get moved into Glacier for archiving (at a much lower cost). You can also configure the expiration of an object so that it is automatically deleted after a set number of days. Of course, for a long term back up, I don't want to apply this. 

There is a cost for lifecycle transitions: $0.053 per 1,000 requests.

 

My current S3 back up strategy

Here's a quick run down of how I'm backing up my images to the cloud using Amazon S3. Bear in mind that in addition to this cloud backup, I have 2 separate hard drives each with a copy of my master library so in total that makes 3 individual back ups.

  1. I set up a bucket on Amazon S3 called Master Library.
  2. I configured this bucket's lifecycle to transition anything that goes into it from the Standard storage class to the Glacier storage class after 1 day. This is the step that moves data to the low cost storage class.
  3. On my computer, I have a scheduled task that runs S3cmd every day to sync the images in my master library to the Amazon bucket. Importantly, the syncing operation does not delete any items from my Amazon bucket, it simply adds new items that don't exit there. This means that if I mistakenly delete a file from my computer it should always exist in the cloud.

That's it. Three steps and I have a viable, long term, secure back up of all the images I have ever made.

One of the drawbacks of this approach is the initial uploading of data. For several hundred gigabytes of images, it make take a few days to put them all into your bucket but moving data into a bucket is free - you just have to leave your computer running for a while or do it in batches.

 

Conclusion

I hope this article gives you some information about Amazon S3 and how, although daunting at first, it can be used as a reliable, low-cost archiving solution for your images - or anything else for that matter! If you need any more information about getting set up with a back up strategy like this, leave a comment below or get in touch.