From EC2 to Fargate

An introduction to the new container orchestration service from AWS

Problem

AWS diagrams, as usual, hiding the complexity of your final setup

The diagram above is concerned only with Fargate and container orchestration. If you are build anything of moderate size you will need to consider much more beyond that.

You will need to consider what type of application you are deploying.

  • Are they under different domains and APIs?
  • How quick will you need to redeploy aspects of you infrastructure? Minutes, hours, days?
  • What is the security consideration of each app?
  • Do you need to secure some APIs and not others?
  • How will you deal with logging and monitoring?
  • Is it behind a load balancer?

We also need to consider CI/CD pipelines which link to each of these new platforms.

Disclaimer

As described above you will need to setup VPCs, subnets, route tables, security groups. I could go on… This post will focus purely on Fargate and its setup.

How we got there

Due to time constraints and the fact that Fargate did not exist at the time, the first version of our infrastructure deployed our applications to EC2 instances directly.

The EC2 instances were controlled by autoscaling groups to deal with increased load. Consul was used for service discovery in conjunction with Fabio as a reverse proxy. Packer was used to describe the underlying VM.

This setup was very fast to implement but had some major issues:

  • Wasteful since our applications did not use all the underlying EC2 resources
  • Deployments took down services since the EC2 instances have to be frequently restarted
  • Local testing was error prone since your could not simulate the EC2 environment completely

One benefit to this is that you could have ssh access into the EC2 instances which meant you could debug certain bugs faster. However most of those bugs were created from not having immutable containers in the first place.

Before we can orchestrate containers we need lots of containers!

Once we have created the images for each application we can move on to an orchestration framework. We chose ECS since it was simple to setup and we were fully committed to AWS already.

ECS

ECS works by forming a logical group of EC2 instances called a cluster. Services are deployed on each cluster. These services ensure that each task we specify is up and running with the correct number of task instances needed to service the current load level. A task is a logical grouping of containers which share resources and linked by local network interface. You should only place containers in the same task if you want them to scale together. In most cases a a task will contain only a single container.

As mentioned above, we need to move the applications over to docker since ECS is a container service. Below you will find an example docker file for one of our node applications using the builder pattern to optimise image download size from ECR.

We have now built a system which

  • Checks availability of each service
  • Restarts broken services
  • Spreads services between AZs
  • Quickens deployment updates to minutes
  • Keeps constant uptime.

Perfect! However, you still have EC2 instances. You still need to manage those resources and describe to each ECS task how much of the underlying EC2 resources you want to dedicate to it.

Fargate

Fargate simplifies infrastructure management for many companies by removing the need to monitor the underlying EC2 instances.

How do we solve the issues above?

We could employ Fargate to orchestrate our containers without managing EC2 instances. The decision to do this depends on a few factors

  • The size of your workloads
  • The size of your environments
  • The periodicity of your applications
  • The size and number of each application (ENIs limits)

If you have small workloads or environments or lots of cron jobs then Fargate is perfect for you. You do not want to deploy and manage EC2 instances which you are not utilising.

The last point is linked to the number of ENIs you can attach to EC2 instances which limits the number of IP addresses you can link to your EC2 instances. This means that ECS will only fill up an EC2 instance with more tasks until it can’t associate more ENIs. So if you have lots of tiny containers then you will be underutilising your EC2 instance since you will only be able to deploy a certain number of containers into your instances.

This is were you need Fargate. Our applications were tiny node apps so Fargate was a perfect fit for us.

Disclaimer

Fargate employees a a very particular network strategy known as awsvpc which consumes all ENIs as far as I know. This mean you are very limited in what kind of networks you can setup relative to something like kubernetes. So it was a good fit for us but maybe not yourself. This can even be an issue setting up static ip addresses if, for whatever reason, your application needs that.

Note: This last point I am not 100% sure on and would have to do more research. I would definitely say fargate will always be a simpler and therefore a more limited solution compared to many other rivals.

Implementation

The setup of this will depend on your needs. I want to demonstrate our ECS module for setting up applications and focus on those issues since there might be some overlap with your own requirements. Highlights:

  • Setting up listeners and target groups
  • Connecting to ALB and routing
  • Targeted scaling
  • Task definitions

Below you will see we pass in many arguments to this module. In production we have this setup so we only have to change a single variables file to deploy new applications. This was needed since we have new applications and platforms deployed every other day. This was a good design for us but does mean the terraform code can be rather complex. However, it is between that and duplicating code which was not an option for us due to time constraints and the size of our platform.

Check out the comments below to see more detail about some of the variables and gotchas.

Listeners and target groups link our ALB routes to running tasks. They also specify health checks and other options like sticky sessions.

CPU and memory util was used for targeted autoscaling. Targeted autoscaling tries to keep these metrics to a fixed value. So you do not under or over utilise your underlying resources.

Note: you should be careful what metric you use with targeted autoscaling. The metric must be something which will decrease linearly as you add or remove task instances.

Terraform creates the task definitions for each app using a template.

  • Port numbers are the same for each task now since IP varies using the AWSVPC network mode
  • We can attach a log driver without having to install cloudwatch agent on each EC2 instance
  • Task level environment variables

The last point is important. The environment variables at this stage set values which are app independent or anything which will not change. We do not set environment variables at this stage since we want to be able to set new environment variables without re-running terraform. It highlights that configuration is not a single event but can often come in stages. Also note we pass in redis and any other service names we have just created with terraform. This is nice since we don’t have to update a separate config store with information we already have.

Conclusion

Fargate is much easier to setup than other alternatives but you need to know exactly what your requirements are before you start. Other solutions exist like Kubernetes or you could continue down the serverless road and end up with a fully serverless solution with AWS lambda.

The decision is yours.

A DevOps engineer specialised in cloud infrastructure with a background in theoretical and experimental physics.