Julian Wraith

Menu Close

Tag: AWS

AWS Elastic File System

Yesterday in a knowledge session between Solution Architects, the topic of AWS Elastic File System was raised and after a short discussion it was decided to take a closer look and set something up. To quote Top Gear, how hard could it be?

What is EFS?

AWS Elastic File System, or EFS, is Amazon Web Services’ latest storage solution and is a fully managed, simple and scalable file storage to use with EC2 instances. As the name suggests, it grows and shrinks automatically with your storage needs and EC2 instances can access EFS using NFS (v4.1), over multiple availability zones at low latency with high throughput (50 MB/s per TB with 100 MB/s burst). AWS lists the use cases of EFS to be; Big Data and analytics, media processing workflows, content management, web serving and home directories. Content Management you say? Hmmm J

From my past, scalable single sources of file system based content were expensive and difficult to deploy. So much so, that product and implementations strategy meant that putting all content in a database was by far and away the most logical route to take. So could EFS now resolve that headache? I will give it a test to find out.

What do I have to set up?

So I will simulate a website setup where I have an application server tier that would host my Tomcat (or similar) application servers and a back end file system which will be mounted as to my application servers so that the files can be used. Onto my file system I will deploy my content. I won’t install or configure Tomcat, this is simple to do but covered very well in other places.

The simple architecture

The simple architecture

So, I will need

  1. An auto-scaling group covering two availability zones (eu-west-1a, eu-west-1b) with two instances of Amazon Linux (no Tomcat, no auto-scaling rules for now)
  2. Security Group to allow my auto-scaling instances to talk NFS to my EFS
  3. An EFS created and mounted to my instances

For my auto-scaling group, I have gone and created a simple one and it is up and running across my two availability zones. I have gone and terminated an instance or two just for fun. That’s not related to this post, it is just fun to terminate something and watch it auto-magically reappear.

My security group allows instances that are a member of my auto-scaling security group, access to the EFS volumes via the NFS protocol

My Security Group

My Security Group

I can now create my EFS for my website content.

I first need to configure the file system access which consists of my VPC, my mount targets (availability zones) and the security group that defines the source of access requests (the one I created early):

Configuration of EFS

Configuration of EFS

Then I configure the optional settings. I have chosen to give it a friendly name and stuck to the default “Performance Mode” of general purpose.

Configure the EFS options

Configure the EFS options

The final review step and then I am done. That was it. No configuring disk sizes, difficult calculations on my requirements of how much content I have. It’s done.

Review what I did

Review what I did

After a shirt whole my volumes are ready and I can keep track on the status of creation in the main EFS dashboard under “life cycle state”.

After a short while they will be ready

After a short while they will be ready

Next we are going to test drive mounting my volume to my instance. EFS provides some instructions to be able to do this from the dashboard. Running in a ssh session (from the root);

Step 1: If needed, install the NFS client on your EC2 instance

sudo yum install -y nfs-utils

Step 2: Create a new directory on your EC2 instance, such as “efs”

sudo mkdir efs

Step 3: Mount your file system using the DNS name.

sudo mount -t nfs4 -o nfsvers=4.1 $(curl -s efs

Once that is done I can switch to the directory and create myself a simple index.html file for my eventual Tomcat server to see. If I then log on to my other instance, I can see that my file has been replicate from the first availability zone to the next. This means, if I would write my content to disk as I have done, it would be available instantly in the other availability zones and all my sites would be updated.

As I did this manually, if my auto-scaling group scales then I would need to do this each time. This defeats the purpose of auto-scaling. However, if I mount this directory at instance initialization time (e.g. chef) then it would be mounted when my new instance starts. To test this I made a very simply launch script and updated my Launch configuration (made a new one as edits are not possible) to add the following to the user data portion of the configuration.

cd /
sudo mkdir efs
sudo mount -t nfs4 -o nfsvers=4.1 $(curl -s efs

Warning: I would not use this code in production. No really, please don’t.


The most complicated thing about this is to mount the drives as creation of the fully managed and scalable storage is incredibly easy. For content management systems, like SDL Web (Tridion) this is a real help in deployment of content in a scalable and reliable way.

Amazon Web Services, Simple Storage Service (S3)

Having recently joined Amazon Web Services (AWS), I need to deep dive all the services in detail to understand the features in as much detail as I can. Simple Storage Service (or S3) has been a recent topic I have had focus on, part because of learning but also due to needing to support my customer with questions on S3. So what is S3? We shall start with a quote from the AWS documentation that describes it in one paragraph much better than I can:

Amazon Simple Storage Service (Amazon S3), provides developers and IT teams with secure, durable, highly-scalable cloud storage. Amazon S3 is easy to use object storage, with a simple web service interface to store and retrieve any amount of data from anywhere on the web. With Amazon S3, you pay only for the storage you actually use. There is no minimum fee and no setup cost.

Object Storage

S3 is an object based store which means it does not store files like a file system but rather as objects which consists of both the file itself but also its metadata. Metadata contains information about the data (the file) and can be used to support application behaviors and administrative actions. These objects are organized into buckets and each bucket needs to have a unique name across S3 which means that you need to be a little creative in naming your buckets because chances are someone has already taken the name “test”. Buckets can store objects from 1 byte to 1 TB and objects can be organized into folders and subfolders.


S3 access polices set on it that dictate who and what can access a given S3 resource (e.g. object, bucket). Such access policies that are attached to resources are called resource-based policies. If you attach an S3 related policy to a user in your AWS Account this this is referred to as a user policy. User policies may say if the given user has permissions over a bucket where as a resource policy may state that “everyone” has access to read a bucket. A combination of policy types can be used to manage access to the objects.

By default buckets are closed to the outside world and so you need to open up access if you want them to be used by other resources or users and you can set as fine grain access as you require. It’s generally a good policy to restrict access as much as possible and implement features like MFA delete to ensure that it’s harder to make mistakes.

Durable (and available)

One of the major features of S3 is both the availability and durability of the service. In on-premise environments, you needed to go to a lot of expense and effort to ensure that your storage is available to a high-standard. S3 rolls of your mouse click with a, mostly, four nines availability. Why only mostly four nines? You do not always need such levels of availability so AWS has a differing S3 storage type called “Infrequently Accessed” or IA storage. This storage type drops the availability down to three nines and should only be used for data you need from time to time and that if it is not available, it’s not really a major issue (for example, old product documentation).

If your data is not available you can be very sure that it has not gone anywhere. The difference between availability and durability is, is your data accessible and is your data still there, respectively. You can lose access to the data but be sure that it is still going to be there when access is restored. For the most storage types the standard is a startling eleven nines. Which in essence means is near impossible to lose an object. “Reduced Redundancy Storage” or RRS storage has a lower durability and should only be used for things you can lose such as copies of data or temporary data. Still at four nines, I would still class it is highly durable.

Storage Class Durability  Availability
Standard 99.999999999% 99.99%
Standard IA 99.999999999% 99.9%
GLACIER 99.999999999% 99.99% (after you restore objects)
RRS 99.99% 99.99%



S3 is highly scalable and you do not need to do anything to enable that, it’s all part of the service.

Storing and retrieving

Objects can be uploaded, updated, deleted etc. from the AWS management interface. However, for normal use the most likely way to deal with objects is programmatically via the SDKs that talk to S3’s RESTFul interface, probably via an integration with a product that uses S3 as a storage tier. S3 has as a consistency model of “Read after Write consistency” for PUTS of new objects and “eventual consistency” for overwrite PUTS and DELETES. This means that when you PUT and object for the first time, it will be readable directly after being written. However, when you overwrite PUT the consistency is eventual, meaning it will be available on all replicates eventually. There is therefore a chance that applications read the older version of an object if they read an object after the object is overwritten but before it is consistent across all replicas.

For large (>100mb) uploads, you should consider multi-part uploads. In multi-part uploads, the file you are uploading is broken into pieces and sent separately. S3 assembles the parts back to the complete file when all the parts have arrived. Doing so not only improves throughput (e.g. uploading in parallel but also uploading whilst creating) but also improves the reliability of your uploads (e.g. network errors, needing to pause).


For S3 you pay for only what you use and there are no setup costs or up-front fees. The AWS website holds details of the costs and cost differs per storage tier. The differing storage classes all have different associated costs (e.g. per GB) and this means with the good storage planning you can save significant expenditure. Organizations who have existing data on say standard S3 could also remodel their storage to improve its cost effectiveness.

Other important S3 features

Lifecycle Management

Lifecycle Management allows you to manage the lifetime of objects in your S3 buckets against rules you have defined. A simple use case of this is managing backup data. For backups, you typically have a policy that dictates how long you store your backups.

For example, you keep daily backups for the last 30 days and then a monthly backup for the last 12 months.

This means that you need to automatically remove monthly backups older than 12 months and daily backups older than 30 days. With lifecycle management you can do this. Moreover you can add addition S3 based rules. For instance, you could decide to keep the last 7 days of backups on standard storage and then the 7-30 days backups on Infrequently Accessed Storage and then all the monthly backups on Glacier. All of which will lower the cost of the storage of the backups.


Versioning allows you to keep versions of objects as they are updates with new objects and is used in combination with Lifecycle Management. For each bucket you want to use it on, you need to enable it (as it costs storage) but it then makes it more difficult to permanently lose something.

Cross-Region Replication

Cross-Region Replication allows you to asynchronously copy data from one S3 bucket to another in a different region. S3 is a region based service and data is never moved from a region without a customer enabling this function. So, like versioning, you need to enable this on your bucket and decide which destination bucket in which region that will be the target of replication and what to replicate (all or a subset of the bucket).

To do this you need to have version enabled buckets (which also needs Lifecycle Management enabled), you need to have two buckets in two regions and S3 needs permission to replicate the data from one bucket to another. It’s important to note what is and is not replicated because things like Lifecycle Management needs to be dealt with per region and not via replication.

© 2019 Julian Wraith. All rights reserved.

Theme by Anders Norén.