Fargate Design Patterns

Tom Thumb’s Journey to Fargate on a few Pennies

AWS Fargate is the Uber of container service allowing engineers to hail a container by specifying their compute and memory needs. By providing incredible on-demand flexibility and removing the burden of resource provisioning just as Lambda did years ago to servers, Fargate is disrupting the container management technology.


Making software behave predictably in different environments where it is deployed during the lifecycle of an application is one of the biggest challenges of development. Subtle differences in system software on which developers have no control – even in same operating systems – can cause unexpected behaviors and are hard to debug and fix.

Containers were invented to solve this problem. Containers encapsulate entire runtime environments for an application or service including dependent libraries, configurations which are just enough to run the application into software packages that are portable across operating systems. By sandboxing the application into just enough space and opening just the right ports for communication with the outside world, containers also increase the security of an application by reducing blast radius and increasing the number of services that can be run on a unit of hardware.

First released in 2013, Docker introduced the concept of containers. Kubernetes followed in 2014 allowing multiple Docker nodes running on different heterogenous hosts to be orchestrated by automating provisioning, networking, load-balancing, security and scaling across these nodes through a single dashboard or command line. Both of these technologies required the upkeep of the underlying cluster of servers & operating system through upgrades, patching, rehydration, and security management. Amazon introduced ECS and EKS as platform services to streamline this management process for Docker and Kubernetes respectively.

What is AWS Fargate?

Put simply, AWS Fargate is container management solution provided by AWS to run your containers without having to worry about managing a cluster of servers. You don’t have to choose server types, upgrade or patch servers or optimize container packing on your clusters.

This is analogous to hailing an Uber car service. With Uber, you just tell what size car you want based on how many people are riding, if you want a car seat or want the car to be wheel-chair accessible. You don’t specify a Lexus or a Toyota. With Fargate, all you do is package your application in containers, specify the CPU and memory requirements, define networking and IAM policies, and launch the application. Fargate takes care of scaling so that you don’t have to worry about provisioning enough compute resources for your containers to scale out or scale them in when they are not utilized. In essence, Fargate separates the task of running containers from the task of managing the underlying infrastructure. Developers can simply specify the resources that each container requires, and Fargate will handle the rest. As a result, applications deployed on Fargate can save you time, manpower, and money.

If you are used to traditional container management, you will really appreciate Fargate allowing you to focus on the ‘Dev’ part of designing and building your applications and reduce the ‘Ops’ part of managing infrastructure from your ‘DevOps’ responsibilities.

Components of AWS Fargate

Fargate Components
Fargate Components


A Task is the smallest deployable unit on Fargate. A Task can be composed of one or many containers. You use a Task Definition as a blueprint for specifying which container images and which container repository you want to use for running your Task. It also specifies the CPU, memory, the roles to use for executing the task. Fargate then uses this information to spin up the containers for executing the Task.


Fargate allows you to run and maintain a specified number of Tasks simultaneously in an Amazon ECS cluster. This is called a Service. If any of your tasks should fail or stop for any reason, the Amazon ECS service scheduler launches another instance of your task definition to replace it and maintain the desired count of tasks in the service depending on the scheduling strategy used.

In addition to maintaining the desired count of tasks in your service, you can optionally run your service behind a load balancer. The load balancer distributes traffic across the tasks that are associated with the service.


An Amazon ECS Cluster is a logical grouping of tasks or services. Clusters are AWS region specific and can contain tasks using both the Fargate and EC2 launch types.

AWS Fargate – the Good, Bad & Ugly

Good & Bad: Pay Per Use

Fargate is a good choice if you are leaving a lot of computing power and memory foot-print unused. Unlike ECS or EKS, you only pay for the computing time and memory that you actually use. It also integrates well with other AWS services allowing you to schedule tasks and run them based on events while automatically fading them out when not in use.

While Fargate provides you an opportunity to cut costs by charging you only for the time your container is running, the average per-hour cost for running Fargate is more than the per-hour cost of running ECS or EKS in spite of major price reduction in Jan 2019 proving once again that there is no free lunch. The cost differential is the price you pay for not having to deal with the complexity of managing infrastructure or investing in time and resources to deal with the cluster management that comes with the traditional solutions.

As a result, the onus is on you to make the right choice based on the size of your workload, availability of skilled resources to manage and secure clusters, etc.

Good: Low Complexity

With its Container-as-a-Service model, you don’t have to worry about the underlying infrastructure you need for deploying your container, how you will optimize usage or secure them. Instead, your focus reduces to the four walls of your container – its size, power, and communication with the outside world aka memory, CPU, and networking.

Good: Better Security

Since you don’t have to worry about securing the entire cluster of servers, your security concern is reduced to security within the container, the roles required to run your application, the ports that must be opened for the application that is running inside the container to communicate with the outside world, etc.

Good: Faster Development

As the problems of systems management are alleviated, developers spend less time on operational issues and can focus on solving business problems building services.

Good: Scaling

As Fargate is serverless, scaling is taken care of by the provider seamlessly. As a result, you do not have to consider the number of concurrent requests you can handle. Having said that, if you integrate Fargate with downstream server-based solutions, you should expect an increase in load on those components when your services running on Fargate scales out significantly.

Bad: Limited Availability

While AWS is rolling out Fargate to as many regions as they can, it is not as available as Lambdas, ECS or EKS. As of April 2019, Fargate is not available in GovCloud, Sao Paulo, Paris, Stockholm, Japan, and China.

Behavioral Design Patterns for AWS Fargate

Behavioral patterns provide a solution for the better interaction between components and foster lose coupling while providing the flexibility to extend these components easily independent of each other.

In this section, we will explore three behavioral design patterns for AWS Fargate viz., the Container-on-Demand, Scaling-Container and Sidecar-Assembly patterns that allows Fargate to be used just like Lambdas for heavy on-demand tasks where Lambda is not suitable, or allow you to run containers traditionally but without having to manage infrastructure. Additionally, we will explore how to attach sidecar containers to a parent container to provide supporting features for the application.

We will use the Container-on-Demand pattern to build an on-demand video thumbnail service to generate thumbnail images from video files. With this pattern, you can spin the containers on demand and immediately decommission after the task is run.

We will use the Scaling-Container to build an auto-scaling service that finds the value of the coins thrown on a table from an image. With this pattern, you will have a small footprint always running and scale up or down as the processing demands.

Later we will explore the Sidecar-Assembly pattern to deploy components of an application into a separate container to provide isolation and encapsulation.

Container-on-Demand Pattern

Context & Problem

AWS Lambda lets you run functions as a service. This allows you to build applications as a conglomeration of serverless microservices which react to events, eschewing development of core functionalities, easy deployment, automatic scaling and fault tolerance. But Lambda has many resource limitations and in general, it is not efficient for running long-running jobs.

For instance, these are current limitations on Lambda (as of April 2019): – The default deployment package size is 50 MB. – Memory range is from 128 to 3008 MB. – Maximum execution timeout for a function is 15 minutes.
– Request and response (synchronous calls) body payload size can be up to to 6 MB. – Event request (asynchronous calls) body can be up to 128 KB.

These are severe limitations for processing several types of applications including machine learning models where the size of libraries go much above the maximum deployment package size of 250MB or may take longer than 15 minutes to run a batch.

As a result, it is not possible to run large workloads or long running processes on Lambda. Further, the resource limitation around the size of the software package restricts the type of workloads you can run on Lambda. For instance, if you have a machine learning model that requires the usage of large libraries such as Scikit, Numpy, etc, it is impossible to fit the software package in a Lambda deployment.


Deploy your software package in a container as a Fargate Task. Invoke the task using a Lambda. The Fargate Task is started from a dormant state. Once the process is complete and the output is written to the output repository, the Task is automatically stopped. As a result of this, you pay only for the time the Task is running. Additionally, you can preconfigure the size of the task (vCPU, memory, environment variables to pass parameters to the job) or override it for every invocation.

Container on Demand Pattern
]40 Container on Demand Pattern

The entry point in the container can be as trivial as a shell script or could be complex as a web service. But the point to note here is the job submitted to the Fargate Task, in this case, should be asynchronous. As a result, large software packages running large workloads can be run using this pattern.

Pattern Components

  • Input Repository – The input for your Processor is stored here and should be reachable by the processor. This could be an S3-based object store or a database. Ideally, this repository should notify the task invoker when a new object is uploaded or updated.
  • Task Invoker – A short-running function that is used to invoke your Processor. This could be a Lambda function or a synchronous service running as part of another larger process chain.
  • Processor – A long-running task that is the core of the pattern. It is invoked by the Task Invoker. This could be a Fargate Task that reads its input from the Input Repository, processes it and writes back the output to the Output Repository. The Fargate task can be configured to use one or more containers (with a maximum of 10).
  • Output Repository – Results of the Fargate Task are stored here. Again, this could be an S3 store or a database and could be optionally configured to emit events on inserts and updates.


While using this pattern Fargate puts Lambdas on steroids, Fargate has its own resource limitations due to its serverless nature. For instance, the number of tasks using the Fargate launch type, per region, per account cannot be more than 50 or the maximum container storage for tasks using the Fargate launch type cannot be over 10GB.

If you think your workloads will breach these limitations, you should seriously consider AWS EMR or AWS Glue for your solution’s tech stack.

Container-on-Demand Pattern – Example

Tom Thumb – A Video Thumbnail Generator Task

Tom Thumb is a video thumbnail generator task. It is implemented following the Container-on-Demand pattern. In typical usage, a user uploads a video file to an S3 bucket. A trigger is set on the S3 bucket to notify a Lambda function in the event of a file upload to the video folder in the bucket. The Lambda is deployed with a Python code to extract the name of the video file from the Lambda notification event and invoke a Fargate task. The Fargate task consists of one container that uses FFmpeg application to decode the video and freeze an image at a given position in the video. The frozen image is written to a pre-configured folder in an S3 bucket.

Code Repository

All code examples, prerequisites, and instructions are available in the companion Git at tom-thumb subproject.

Scaling Container Pattern

Context & Problem

In the problem section of the Container-on-Demand pattern we discussed how the limitations on long-running processes rule out Lambda for asynchronous workloads. Therefore, we use the Container-on-Demand pattern to overcome the time limitation of Lambda which cannot exceed 15 minutes.

While the Container-on-Demand pattern solves this issue, for synchronous web services that execute within these limits, the main limitations are the size of the deployment package, networking, or the language supported in Lambda.

As of this writing in April 2019, AWS Lambda natively supports Java, Go, PowerShell, Node.js, C#, Python, and Ruby code. Most recently AWS Lambda provides a Runtime API which allows you to use any additional programming languages to author your functions. While the concept of allowing you to bring your own runtime is radical, it is not straight forward as can be seen from this author’s experiment here.

How do we run synchronous services where the size of the deployment package exceeds the Lambda limits?

While Lambda Layers mitigate some of this issue by allowing artifacts to be shared between Lambdas, it introduces it own set of issues, especially around testing Lambdas locally and layers still count towards the 250MB hard limit on the unzipped deployment package size.

What if you want to run always-on services that can scale on-demand?

Note that, the Container-on-Demand pattern spins up a task to execute the job and spins it down. For asynchronous workloads, the time taken to spin-up is not an issue. But for synchronous web services, time is dear.


Following is a possible solution to use a Fargate Service fronted by an Application Load Balancer.

  • Deploy your service in a Fargate Task
  • Open ports for two-way communication in the Task and Container
  • Create an ECS Service to wrap around the Fargate Task.
  • Attach an Application Load Balancer in front of the Fargate Service.
  • Register an auto-scaling target with rules on when to scale out your service and when to scale it in.
Scaling Container Pattern
]48 Scaling Container Pattern

Pattern Components

  • Fargate Task – A Fargate task that has its ports open for two-way communication using one or more containers (within a maximum limit of ten containers).
  • ECS Service – An ECS service that uses the Fargate Task from above identifying the desired count of tasks that must be run at any given time.
  • Application Load Balancer – An Application Load Balancer with a listener to forward requests to the ECS Service.
  • API Gateway – An optional API gateway configured to forward requests to the application load balancer.
  • Web Interface – An optional browser-based interface for allowing users to post requests to the service. This could be a simple HTML form.

Scaling Container Pattern – Example

Bean-counter – A Coin-counter Service

Bean Counter is a coin counter service. It will analyze an image of coins and return the total value of the coins in the image. It works only on US Mint issued coined and does not recognize any denomination above a quarter dollar coin. It also assumes that the picture contains a quarter. The quarter is used to calibrate the size of the coins. It is implemented following the Scaling-Container pattern. In typical usage, a user navigates to the URL of the ALB on the browser and enters the URL for the service along with the location of the image file containing the picture of the coins. The Bean-Counter service then invokes the Fargate Task and returns the response to the browser.

Code Repository

All code examples, prerequisites and instructions are available in the companion Git at bean-counter subproject.

Sidecar Assembly Pattern


Services require orthogonal technical capabilities, such as monitoring, logging, configuration, and networking services. While the components encapsulating these orthogonal capabilities can be integrated into the main service, it will leave the main service exposed to the vagaries of these components. For instance, they will not be well isolated, and an outage in one of these components can affect other components or the entire service. Also, they usually need to be implemented using the same language as the parent service. As a result, the component and the main service have close interdependence on each other.

One option is to deploy these orthogonal components as separate services allowing each component to have its own life-cycle and be built using different languages. While this gives more flexibility, deploying these features as separate services can add latency to the application.


Co-deploy the orthogonal components along with the main service by placing them in their own containers. Containers in a task are co-deployed together in the same host thereby not affecting the latency of the service significantly for the communication between them. As a result of this co-deployment, the sidecar and the main service can access the same resources. This allows the sidecar to monitor system resources used by both the sidecar and the primary service.

Side Car Assembly Pattern
]50 Side Car Assembly Pattern

This pattern can also enable applications to be composed of heterogeneous components and services that have expanded capabilities beyond what is provided by these individual services. In essence, to reinforce that the whole is greater than the sum of its parts. The sidecar also shares the same lifecycle as the parent application, is created and retired alongside the parent.


Each application is unique and solving different needs based on business requirements. If the task of infrastructure management is too onerous and/or if you only want to pay for your computing time, then Fargate may be the right choice for you.

On the other hand, if you need greater control of the network resources or have large container workloads with consistent demand throughout the day, then it warrants maintaining a cluster of servers to run ECS or EKS. With the latter choice, you can use reserved or spot instances to offset your cost.

Scenarios where Fargate is most Beneficial

Fargate can be used with any type of containerized application. However, this doesn’t mean that you will get the same benefit in every scenario. Fargate would be most beneficial for projects that need to reduce the time from ideation to realization such as proofs-of-concept and well-designed, decoupled, micro service-based architectures deployed in production environments.

Applications can consist of a mix of Fargate & Lambda to exploit the Serverless model.

Use Lambdas for small & tight services with low memory (<3GB) and small request-response cycles (<15 mins).

Use containers deployed on Fargate for: – Existing legacy services that cannot be trivially refactored, – Applications are written in languages not supported by Lambda, – Need to use large libraries that cannot fit into a Lambda profile (Quantlib, Scikit, etc), – Where you need more control over networking, computing horsepower or memory – Use cases that require a long in-process runtime.

Scenarios where Fargate may not be the Best Choice

  • When you require greater control of your EC2 instances to support networking, COTS applications that require broader customization options, then use ECS or Kubernetes without Fargate.
  • When you want fast request-response cycle time then Lambda may be a good choice. This is especially true if you are using large container images written with object-heavy languages such as Java/Scala that requires significant initiation time to start the JVM and bootstrap objects.
  • By breaking down your application into smaller modules that fit into Lambdas and using Layers and Step Functions you can reap the benefits of Serverless architectures while paying only for your compute time.

Managing FaaS Services Deployed Across Different Cloud Providers


Sometimes you have to pick the best of the breed solution for different needs. This is true among the services provided by the different cloud providers as well. For instance, when it comes to cognitive services, Google, Amazon, and Microsoft rule the roost. Even among them, Google does natural-language translation, landmark recogintion, and text extraction from images, and content-based search the best. Amazon is the leader in facial recognition. Similarly, I found Microsoft’s image labeling the best among the breed.

No longer do you have to settle with one cloud provider to solve your needs. With frameworks such as the Serverless Framework, you can develop services across various providers, deploy and manage them in a cloud-agnostic fashion. With a single environment, you can develop, test and deploy to most of the big cloud providers without having to worry about their idiosyncrasies and react to cross-cloud events.


This is a simple tutorial to demonstrate how to deploy multiple services on different cloud providers using the Serverless Framework.

More specifically, this tutorial walks you through deploying an image detection service on Google Cloud Platform (GCP) and managing it using a proxy service running on Amazon Web Service. Both the services on either platform is 100% serverless.

The image detection service running on GCP uses Google’s FaaS solution viz., Cloud Functions and the proxy running on AWS uses Amazon’s FaaS solution viz., Lambda.

In a typical scenario, you will use a service such as this to detect the contents of an image uploaded to a S3 bucket and take appropriate actions based on the result. For instance, you could use it to blur/reject the image based on the vulgarity or get the image labels and chain it to other services that will translate the labels to multiple languages to cater to your customer needs.

Code Repository can be found here.


Setup Amazon AWS

  1. Sign into your AWS account or sign-up for one.

  2. Setup your AWS credentials by following the instructions from here.

Install node.js and Serverless framework

Serverless framework is a node.js application. To use Serverless framework and run the tutorial you need to install node.js. Follow the instructions from Serverless website to install both node.js and the Serverless framework.

Ensure your Serverless framework is operational using the following:

$ serverless --version

Testing your Serverless Setup

Now that you have setup AWS, it is time to test your Serverless setup by creating a mock function using the Serverless framework.

Create a test directory. In the test directory, create a Lambda function from the default template as follows:

$ mkdir sls-tester
$ cd sls-tester
$ sls create --template aws-python --name sls-test

This should create two files in the current directory:



The serverless.yml declares a sample service and a function. The handler.py returns a message stating that your function executed successfully.

To deploy the function, simply type:

$ sls deploy --verbose

This should deploy the function. The verbose option provides extra information.

To test your function, type:

$ sls invoke --function hello

If you get the following message, your Serverless setup is working.

    "body": "{\"input\": {}, \"message\": \"Go Serverless v1.0! Your function executed successfully!\"}",
    "statusCode": 200

To check the logs for your function, type:

$ sls logs -f hello

To keep a continuous check of the logs for your function, type:

$ sls logs -f hello -t

Setup Google Cloud

  1. Sign up for a new Google account at http://accounts.google.com. If you already have an account you can skip this step.
  2. Sign up for a Google Cloud trial at http://console.cloud.google.com/start. If you already have Google Cloud privileges on your Google account, you can skip this step.
  3. Create a new project and call it serverless-project (or a name of your choice).
  4. Select Credentials in API & Services section of the Google Cloud console.
  5. Under Create Credentials, create a new Service Account Key. Download the JSON key file to a secure place as you will need that in subsequent steps.
  6. In the API & Services dashboard, enable Google Cloud Vision API, Google Cloud Functions API, Google Cloud Deployment Manager API, Google Cloud Storage & Stackdriver Logging.

Image Detector


The gcp-label-image is the service that will deployed on GCP. It is a node.js based service that takes an image url passed through the HTTP request and sends it to Google Vision to detect the contents of the image and return a list of tags describing the content of the image.

The image URL should be passed as an HTTP parameter named imageUri. If this parameter is missing the service uses a default image to detect and return the contents.

Deploying the Image Detector Service

  1. Location: Go to the gcp-label-image subdirectory in the folder where you cloned the Git repository.
  2. Project: Replace the your-gcp-project-id in the serverless.yml file with your Google Cloud project id.
  3. Credentials: Replace the /path/to/your/gcp/credentials/json in the serverless.yml file with the path to the JSON credentials that you saved in the GCP setup.
  4. Deploy: In the service home directory, run the following command to deploy the detectLabel service on GCP. Make a note of the endpoint created. This endpoint is a URL that will end with detect as shown below. shell
    $ sls deploy --verbose
    Deployed functions
  5. Verify: You can check your Google Cloud Functions dashboard to ensure that your Cloud Function is deployed.
  6. Invoke@theTerminal: Invoke the function detectLabel as follows: shell
    $ sls invoke -f detectLabel
    Serverless: ekvy90t28px8 Image Results: landmark historic site sky tourist attraction ancient rome ancient history ancient roman architecture medieval architecture night building
  7. Invoke@theBrowser: Copy and paste the URL from the result of your sls deploy into the browser and add the imageUri parameter as follows: Far

Image Detector Proxy


The aws-gcp-proxy is the service that will be deployed on AWS. It is a Python-based service that will take an image URL passed through the HTTP request and send it to the Cloud Function deployed on GCP.

In a typical use, you will use it to detect the content of an image uploaded to a S3 bucket and take appropriate actions based on the result. For instance, you could use it to blur/reject the image based on the vulgarity or get the image label and chain it to another service that will translate the labels to multiple languages to cater to your customer needs.

The image URL should be passed as an HTTP parameter named imageUri. If this parameter is missing the service uses a default image URL to detect and return the contents.

Deploying the Image Detector Proxy Service

  1. Location: Go to the aws-gcp-proxy subdirectory in the folder where you cloned the Git repository.
  2. Environment Variable: Edit the setEnv.sh file to point update the GFUNC_URL to point to your image detector service running on GCP.
  3. Deploy: In the service home directory, run the following command to deploy the proxy service. Make a note of the AWS Gateway endpoint created. You will use this endpoint to test your service. shell
    $ sls deploy -v
    GET - https://urmileagemaydiffer.execute-api.us-east-1.amazonaws.com/dev/detect
  4. Verify: You can check your AWS Lambda dashboard to ensure that the Lambda function was created and the environment variable is being passed.
  5. Invoke: Copy and paste the AWS Gateway API URL into the browser and add the imageUri parameter as follows: Far


Serverless Framework makes it painless to deploy services across multiple cloud providers without having to deal with the idiosyncrasies of various providers allowing you to focus on your application. Additionally, the framework allows you to use the right provider for the right service, cuts the time spent on deployment while allowing you to manage the code and infrastructure across multiple providers.

Practical Machine Learning

Machine learning is getting more and more practical and powerful. With zero knowledge in programming, you can train a model to predict house prices in no time. The following blog post link from Ofir Chakon explains the basic concepts of Machine Learning in simple and easy to understand terms.

Source: Practical machine learning: Ridge Regression vs. Lasso

Serverless Architecture & Serverless Framework

Per Gartner, by 2022 most cloud architectures will evolve to a fundamentally serverless model rendering the cloud platform architectures dominating in 2017 as legacy architectures.

Serverless is a cloud-native platform model and reflects the core-promise of cloud-computing by offering agility and capability on demand at a value price.

The introduction of function PaaS (fPaaS) as Lambda by Amazon in re:Invent, Nov 2014 (and out of beta in late 2015) created a momentum for “serverless” platform architecture. AWS Lambda was soon followed by most major cloud platform vendors, including IBM, Microsoft, Google and, more recently, Oracle.

Amazon started the trend with Lambda

Separating the Wheat from the Chaff

Serverless computing model is an emerging trend and quite often misunderstood because of the hype and build-up surrounding the topic.

The term Serverless refers to building applications without having to configure or maintain the infrastructure required for running your applications. In reality, servers are still involved, though they are owned and controlled by the platform providers. On the other hand, there are frameworks used for exploiting the serverless architecture uninspiringly named Serverless Framework and therefore increasing the confusion.

Serverless - No need to configure or maintain infrastructure

Serverless Architectures

Serverless Architectures are based on models where the application’s logic provided by the Developer is run on stateless, compute containers that are provisioned and managed by a provider. Typically these compute instances are ephemeral (short-lived for the duration of the request-response cycle) and triggered by an event. As the load on the application increases, additional infrastructure is automatically deployed to meet the need. Due to this on-demand provisioning nature of this architecture, the systems built using Serverless technologies are inherently scalable and highly responsive under load.

**FaaS – Function as a Service **

The technique of building applications using Serverless architecture.

  • Pay-per-execution – Pay per execution model is most efficient at managing costs.
  • Ephemeral – Short-lived process triggered via event.
  • Auto-scaling – Compute resources are provisioned granularly per request.
  • Event-driven – Functions respond to events such as http, file drop, alerts, timer, topics etc
  • Microservices – Modules built to satisfy a specific goal and uses a simple, well-defined interface.
FaaS – Applications Built with Serverless Architecture

FaaS vs PaaS

Some people in the industry refer to the technique of building applications using Serverless architecture as FaaS (Function as a Service). The reason becomes clear when you contrast FaaS applications with the traditionally built applications or PaaS (Platform as a Service) where there is a perpetual process running on a server waiting for HTTP requests or API calls. In FaaS there is no perpetual process (for the most part) but an event mechanism that triggers the execution of a piece of code, usually just a function. You still need a perpetual gateway that will field your API calls to start the events to cascade.

The other key operational difference between FaaS and PaaS is scaling. With most PaaS solutions you still need to worry about scale. With FaaS the compute resources are provisioned at a request level. You cannot get the same level of granularity with PaaS applications even if you set it to auto-scale. As a result of this, FaaS applications are extremely efficient when it comes to managing cost.

Limitations of FaaS

State Due to the ephemeral nature of the FaaS architecture, the state of your application should be managed externally from the FaaS infrastructure or off-loaded to a cache or database. This could be very limiting for certain type of applications running on thin clients or untrusted devices where the application orchestration has to extend through multiple request-response cycles.

State between Requests must be maintained outside of FaaS

Duration Because of the on-demand provisioning and low-cost nature of the FaaS solution there is a restriction on how long your functions are allowed to run. To keep the price low – as you are billed by minutes of usage, some providers such as Amazon AWS and Microsoft Azure restrict the duration of time a function is allowed to process a request.

 Duration of time a function is allowed to run is restricted

Deployment & Resource Limits Some providers such as AWS have deployment limits on the size of the deployment package, size of the code and libraries that can be deployed in the package. This could be severely limiting for some applications such as image processing functions that depend on large libraries that have to be packaged along with the code. Additionally, there are limits on the number of concurrent function executions, ephemeral disk capacity (temp space) etc. While some of these limits are soft limits and can be reconfigured per function by working with the providers, others are hard limits and will force you to reevaluate the choice of your design.

Resources are limited - Use wisely

Latency Due to the on-demand provisioning nature of the FaaS infrastructure, applications that use languages such as Java/Scala that require a long start time to spin up JVMs may encounter longer runtime. Having said that, providers optimize the infrastructure spin-ups based on the usage patterns of the functions. Due to the interpreted nature of Python and Javascript, functions written in these languages may not see a significant difference in latency between a PaaS and FaaS offering.

Test the performance of your applications thoroughly

The Players

While there are new providers entering the market to exploit the Serverless wave, the significant players are Amazon with its AWS Lambda, Microsoft with its Azure Functions, Google with its Google Functions and IBM with its Openwhisk rule the roost with AWS Lambda being the dominant player.

Amazon's AWS Lambda is the dominant player

Serverless Framework

While not having to manage infrastructure by using serverless functions is nice, having to deal with hundreds of functions in a project between multiple providers, managing buckets, messaging and permissions become an issue in itself. Additionally, organizations want to diversify risk and hence do not want to be bound to a single provider.

Add to this mix the idiosyncrasies of the provider when it comes to their FaaS offering. Not only do you have to learn the different terminologies used by the various providers, you will have to learn how to use their offerings on their respective consoles or CLI (Command Line Interface).

To avoid vendor lock-in and allow to deploy your FaaS solutions to various providers, Serverless Framework comes to your rescue. The Serverless Framework allows you to deploy auto-scaling, pay-per-execution, event-driven functions to any cloud. They currently support AWS Lambda, IBM Bluemix OpenWhisk, Microsoft Azure, and are expanding to support other cloud providers.

The Serverless Framework is an MIT open-source project, actively maintained by a vibrant and engaged community of developers and provides robust plugins for various FaaS providers and allows to extend it when needed.

The Serverless Framework allows you to provision and deploy REST APIs, backend services, data pipelines, and other uses cases by providing a framework and CLI to build serverless services across many providers by abstracting away provider-level complexity.

The Serverless Framework is different than other application frameworks because: – It manages your code as well as your infrastructure – It supports multiple languages (Node.js, Python, Java, and more)

Serverless framework allows choice of FaaS providers across a single project

Core concepts of Serverless Framework Serverless Framework consists of the following core concepts:

  • Service
  • Function
  • Events
  • Resources
  • Plugins

Service A Service in the Serverless Framework is the unit of organization. It’s where you define your Functions, the Events that trigger them, and the Resources your Functions use, all in one file titled serverless.yml. More information at https://goo.gl/9SKBvx

An application can have multiple services and hence multiple serverless.yml files.

Functions A Function is an independent unit of deployment or micro service. It manifests itself as a Lambda or Azure Function depending upon the provider. It’s merely code, deployed in the cloud, that is most often written to perform a single job such as:

  • Saving a user to the database
  • Processing a file in a database
  • Performing a scheduled task

Events Anything that triggers a Function to execute is regarded by the Framework as an Event. Events on AWS are:

  • An AWS API Gateway HTTP endpoint request (e.g., for a REST API)
  • An AWS S3 bucket upload (e.g., for an image)
  • A CloudWatch timer (e.g., run every 5 minutes)
  • An AWS SNS topic (e.g., a message)
  • A CloudWatch Alert (e.g., something happened)

When you define an event for your functions in the Serverless Framework, the Framework will automatically create any infrastructure necessary for that event (e.g., an API Gateway endpoint) and configure your Functions to listen to it.

Simply put, events are the things that trigger your functions to run. If you are using AWS as your provider, all events in the service are anything in AWS that can trigger an AWS Lambda function, like an S3 bucket upload, an SNS topic, and HTTP endpoints created via API Gateway.

Upon deployment, the framework will deploy any infrastructure required for an event (e.g., an API Gateway endpoint) and configure your function to listen to it.

Resources Resources are infrastructure components which your Functions use. If you use AWS as your provider, then resources are:

  • An AWS DynamoDB Table (e.g., for saving Users/Posts/Comments data)
  • An AWS S3 Bucket (e.g., for saving images or files)
  • An AWS SNS Topic (e.g., for sending messages asynchronously)

Anything that can be defined in CloudFormation is supported by the Serverless Framework. The Serverless Framework not only deploys your Functions and the Events that trigger them, but it also deploys the infrastructure components your Functions depend upon.

Credentials Serverless Framework needs access to your cloud provider account credentials to deploy resources on your behalf. For AWS you can use AWS CLI (aws configure). Azure is more involved.

Following links provide excellent guidance on setting up the credentials for various providers currently supported on the Serverless Framework.

  • AWS – https://serverless.com/framework/docs/providers/aws/guide/credentials/
  • Azure -https://serverless.com/framework/docs/providers/azure/guide/credentials/
  • Openwhisk – https://serverless.com/framework/docs/providers/openwhisk/guide/credentials/

Deployment Serverless Framework translates the service declaration in the serverless.yml file into a Cloud Formation or Resource Manager template depending on the provider you choose.

To deploy your service, functions, and provision the resources all at once, enter:

serverless deploy --verbose

To deploy a single function after making changes to it, enter:

serverless deploy function --function <myfunction> --verbose

Invoking Serverless Framework allows you to invoke a function locally for testing or invoke a deployed function.

To invoke your function locally, enter:

serverless invoke local --function <myfunction> --log

To invoke a deployed function, enter:

serverless invoke function --function <myfunction> --stage <my stage> --region <myregion>

If you omit the stage and region option, the default stage (dev) and region specified in your provider configuration will be used.

CelebritySleuth – A Sample Use case

CelebritySleuth is a celebrity face recognition service built using Serverless Framework, Twilio, Amazon Rekognition and IMDbPy API.

The CelebritySleuth application is an event-driven application taking advantage of the user’s mobile SMS/MMS for the presentation tier, Twilio in the middle-tier to bridge the SMS world and AWS Gateway and a set of AWS Lambda functions written in Python making use of AWS Rekogniton for image processing and IMDB for gathering information on the celebrities.

CelebritySleuth code repository, installation guide, and usage is at https://github.com/skarlekar/faces

How it works To begin with, you have to train the application to recognize the faces by building a collection of celebrities. You do this by sending a random sample of celebrity pictures (image URLs) and their corresponding names. The more pictures of a celebrity, the more accurate the recognition will be. The CelebritySleuth application consists of two services:

  • Twilio Communication Service
  • Face Recognition Service

The services are decoupled to allow for using different presentation tiers in future.

Architecture The CelebritySleuth application uses Lambda functions for computing needs. As a result, the application components are provisioned just before usage and brought down after use resulting in a low-cost, highly-scalable application.

The above picture illustrates the high-level architecture of the application. Details are as follows:

  1. The user sends a picture and commands to add/match a face to a collection. Alternatively, the user can create a collection – in which case a picture is not required. The SMS/MMS is sent to a telephone number hosted by Twilio.

  2. Twilio intercepts the message and forwards it to an API Gateway based on the user’s Twilio configuration.

  3. API Gateway translates TwiML to JSON and calls the Request Processor lambda function.

  4. The Request Processor lambda validates the commands and puts a message on the appropriate topic on SNS. If the validation fails, it returns the error message to the user via Twilio.

  5. When a message arrives in the Create Collection topic, a lambda is triggered which adds the named collection in AWS Rekognition via Boto libraries. A success/error message is put in the Response Processor topic.

  6. When a message arrives in Add Face topic, a lambda is triggered which identifies the most prominent face in the image and adds the metadata for the face to the given collection. If there are no faces identified, it creates an error message and sends the response to the Response Processor topic.

  7. When a message arrives in Match Face topic, a lambda is triggered which identifies the most prominent face in the image and matches the metadata for that face with known faces in the collection. If a match is found, the corresponding person’s name is returned. The Lambda then uses IMDB to look up the biography of the person.

  8. The various lambda-based processors put the response message on the Response Processor topic.

  9. The Response Processor picks up the response and constructs an SMS message and calls Twilio’s SMS service.

  10. Twilio validates the From number and sends the message to the corresponding To number.


The application consists of the following components:

  1. Python – Python is a programming language that lets you work quickly and integrate systems more effectively.

  2. Twilio – Twilio Messaging Service for having the user communicate with the application through SMS.

  3. AWS Lambda – AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running.

  4. AWS Rekognition – Amazon Rekognition is a service that makes it easy to add image analysis to your applications. With Rekognition, you can detect objects, scenes, and faces in images. You can also search and compare faces.

  5. IMDb – IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters, and companies.


Application in Action

Following is a sample animation of the application in action:


Serverless Framework is an accelerator for adopting Serverless architecture. It promises significantly less DevOps, lower cost, high scalability and multiple deployment options across a variety of providers.

Apart from providing a scaffolding to deploy Lambdas the Serverless Framework allows you to manage multiple Lambdas, manage related infrastructure across multiple regions and stages. To top it off, it allows you to manage the equivalent of Lambda functions across multiple providers.

In my testing, I could not deploy the CelebritySleuth application on Microsoft Azure because the Serverless Framework does not currently support deploying functions written in Python. Although, in speaking to Austin Collins the founder of Serverless Framework at the Serverless Conf 2017 in Austin, I gather that his team is working on providing support to as many languages as supported by the cloud providers.

Asides from this, I was able to build the CelebritySleuth application from start to end in a couple of hours using the Serverless Framework compared to half-a-day for setting up the components manually through AWS console.

How to reduce the life of a software defect?

While building software for large projects with zero defects is virtually impossible, the quality of the software can be determined by the lifetime of a defect – the time when a defect is identified, fixed and released. A defect in a well-isolated module can be easily identified, root-cause analyzed and fixed in a short amount of time than a highly cohesive system.

In addition, adopting an agile development methodology helps  reducing the lifetime of a defect drastically. This is possible by releasing well-tested software in short bursts or iterations and delivering higher quality software compared to other development methodologies.

In an agile project, tests are written before or concurrently while producing the code. Therefore, all code that is delivered for testing is tested code. In most agile projects there are four primary layers of testing:


  • Unit testing, also called development testing.
  • Acceptance testing, also called functional testing.
  • Component testing.
  • System and performance testing – integration testing and testing for non-functional requirements.


In agile methodology, the primary goal is to test all these layers during the course of iteration.

Continue reading “How to reduce the life of a software defect?”