AWS Lambda Performance and Cost Optimization

Following are key factors affecting the performance of serverless applications.

Ram Vadranam

Published in

Stackademic

7 min readNov 15, 2021

Cold starts
Memory and profiling
Architecture and best practices

How does lambda work under the hood?

AWS Lambda function powered by Graviton2 provides up to 34% better performance and 20% lower cost over x86 based lambda.

AWS Graviton2 processor which contains 64-bit Arm Neoverse cores optimized for cloud-native applications.

Lambda function can be deployed with container image or .zip file to run on x86 or ARM-based processor.

Points to Consider before migrating an existing application from x86 to ARM processor.

Interpreted and compiled byte code languages can run without modifications.
Complied languages need to be recompiled for arm64
Lambda container images need to be rebuilt for Arm
AWS tools and SDKs support Graviton2 transparently.

Anatomy of a Lambda function:

Handler() function →Function to be run upon invocation
Event Object → Data sent during lambda function invocation
Context Object → Methods are available to interact with the runtime and execution environment.

Function life cycle-worker host:

Full Cold Start: Steps performed when lambda runs the first time

Downloads Code
Starts Execution Environment
Intensate Run Time
Run Handler Code

Warm Start: Second lambda request received while the first request in progress

Run Handler code

AWS Lambda function contains two types of optimizations.

AWS Optimization → Downloads Code, Starts Execution Environment
Developer Optimization → Intensate Run Time, Run Handler Code

Measuring performance of Lambda Function:

AWS X-ray will help to measure the performance of the Lambda Functions.

Type: AWS::Serverless::Function
Properties:
  FunctionName: !Ref LambdaFunctionName
  CodeUri:
    Bucket: !Ref BucketName
    Key: !Ref LambdaZipName
  Handler: request_handler.lambda_handler
  Runtime: python3.8
  MemorySize: 320
  Tracing: Active

Using SDK at the code level will help to add custom annotations and metadata

from aws_xray_sdk.core import xray_recorder@xray_recorder.capture("upload_to_s3_bucket:")
def upload_to_s3_bucket(data_feed: str):
    pass

Three areas of performance:

Latency
Throughput
Cost

Cold Starts:

Function life Cycle

Cold Start → Execution Environment

Facts:

Effects < 1% prod env
Varies from < 100ms to >1s

Considerations:

Pinging functions to keep warm is limited
Targeting warm environments are difficult

Cold start causes:

Environments reaped
Failure in underlying resources
Rebalancing across AZs
Updating code/config flushes
Scaling up

Cold starts → Execution environments are influenced by the following factors from the AWS side.

Memory allocation
Size of function package
How often is a function called
Internal algorithms

Cold starts → Static initialization(Developer responsibility)

Code run before handler
Initialization of objects and connections
New Execution environment running the first time
Scaling up

Cold starts are influenced by the size of the function package, amount of code, and initialization work.

Developer responsibility for optimization of static cold starts.

Code Optimization:

Trim SDKs
Reuse connections
Don’t load if not used
Lazily load variables

Provisioned Concurrency:

If using a large library is mandatory use provisioned concurrency to avoid cold starts due to code size. Provision concurrency works by pre-creating the execution environment and running the INIT code.

Following are use cases for provisioned concurrency:

Latency Sensitive and interactive workloads
Improved consistency across the long tail of performance across P95, P99, and P100 levels
Minimal change to code
Integrated with AWS Autoscaling
Adds a cost factor for per concurrency provisioned but lower duration cost per invocation. This could save money when heavily used

Function life cycle: Provisioned Concurrency Start

API Call to Lambda Function during provisioned concurrency start.

Provisioned Concurrency: Things to know

Reduces start time to < 100 ms
Can’t be configured for $LATEST. Use versions/aliases
Provisioning ramp up of 500 per minute
No changes to function handler code performance
Requests above-provisioned concurrency follow on-demand lambda limits and behavior for cold-starts, bursting, and performance
Overall account concurrency per region limit still applies
Wider support for CloudFormation, Terraform, Serverless Framework ..etc

Things to know on the AWS side for provisioned concurrency:

AWS provisions more than the requested limit
Environment reap still applies
There is Less CPU burst than on-demand during init

Provisioned Concurrency: Application Auto Scaling:

Autoscaling is used for scenarios in which the required capacity limit is not known clearly. It can be done using a min/max setting or alarm-based value.

ScalableTarget:
  Type: AWS::ApplicationAutoScaling::ScalableTarget
  Properties:
    MaxCapacity: 100
    MinCapacity: 1
    ResourceId: !Sub function:${logicalName}:Live
    RoleARN: !Join 
      - ':'
      - - 'arn:aws:iam:'
        - !Ref 'AWS::AccountId'
        - role/aws-service-role/lambda.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_LambdaConcurrency
    ServiceNamespace: lambda
    ScalableDimension: lambda:function:ProvisionedConcurrency
    DependsOn: FunctionAliasLive

Memory Usage and Profiling:

Lambda CPU performance varied by memory allocation. An increase in memory will increase the CPU performance. If Lambda code is CPU and Network-intensive then allocating more memory will increase the performance of lambda dramatically.

CPU-bound function example:

Compute 1,000 times all primary ≤ 1M

128MB. 11.722 sec. $0.024628

256MB. 6.678 sec. $0.028035

512MB. 3.194 sec. $0.026830

1024MB. 1.465 sec $0.024638

Due to Lambda billing being on GB sec fine-tuning lambda to more memory is cheaper as overall execution time is less.

AWS Lambda Power Tuning: Open source tool helps to fine-tune lambda with memory vs performance vs cost.

Git repo: https://github.com/alexcasalboni/aws-lambda-power-tuning

Features:

Data-driven cost and performance optimization
Available from Serverless Repository
Easy to integrate with CI/CD
Compares two functions

{
    "lambdaARN": "your-lambda-function-arn",
    "powerValues": [128, 256, 512, 1024, 2048, 3008], or all
    "num": 10,
    "payload": "{}",
    "parallelInvocation": true,
    "strategy": "cost|speed|balanced",
    "balancedWeight": 0.5 (0 refers spee strategy 1 refers cost strategy)
  
}{
    "lambdaARN": "your-lambda-function-arn",
    "powerValues": [128, 256, 512, 1024, 2048, 3008], or all
    "num": 10,
    "payload": "{}",
    "parallelInvocation": true,
     "autoOptimize": true
     "autoOptimizeAliases": "Live"}

Using power tools we can compare performance vs cost vs speed of X86 and Arm/Graviton2 processors.

Architecture and best practices:

Optimization Best Practices:

Avoid monolithic functions(Reduces deployment package size. Micro/Nano services)
Minify/uglify production code
Optimize dependencies
Lazy initialization of shared libs/objects(Helps if multiple functions per file)

Optimized dependency usage(Node.js SDK & X-Ray):

//const AWS = require(‘aws-sdk’)const DynamoDB = require(‘aws-sdk/clients/dynamodb’) //125ms faster

X-ray usage

//const AWSXray = require(‘aws-xray-sdk’)const AWSXray = require(‘aws-xray-sdk-core’) //5ms faster

Lazy initialization example(Python and Boto3):

import boto3s3_client = Noneddb_client = Nonedef get_objects(event, context): if not s3_client:
   s3_client = boto3.client(“s3”)# business logicdef get_items(event, context):if not ddb_clinet:ddb_client = boto3.client(“dynamodb”)# business logic

Note: Not a great option if we are using provisioned concurrency

Amazon RDS Proxy:

Fully managed highly available database proxy for Amazon RDS. Pools and shares connections to make applications more scalable, more resilient to database failures, and more secure.

Optimization best practices(performance/cost):

Externalize Orchestration → Avoid idle/sleep- Delegate to a step function
Fine-tune across resources allocation → Don’t guess estimate function memory
Transform, not transport → Minimize data transfer(S3 select, advanced filtering,)
Lambda Destinations → Simplified chaining(async) and DLQ
Discard uninterested events asap → Trigger config(S3 prefix, SNS filter)
Keep in mind that retry policies → Very powerful but not free

Reusing Connection with Keep-Alive:

For functions using HTTPS requests
Use in SDK with environment variables
Or Keep-Alive property in function code
Can reduce typical Dynamodb operation from 30ms to 10ms
Available in most run time SDKs

Service Integration:

Service integration is common in modern application development.

Adding services increases latency mainly a synchronous concern. Example service invokes lambda and Lambda calls another service.
Use Asynchronous rather than synchronous if possible
Use VTL where appropriate
Use Step Function Direct SDK feature
Use Lambda to transform don’t transport data
Avoid lambda calling lambda

Lambda Invocation Model:

Lambda has 3 types of invocation models

Synchronous(request/response)
Asynchronous(event)
Event Source Mapping(stream/queue poller)

Comparing sync vs async lambda services:

Lambda Service A → Lambda Service B

Synchronous:

Caller is waiting
Waiting occurs cost
Downstream slows down affects entries process
Process change is complex
Passes payload between steps

Lambda Service A → SQS → Lambda Service B

Asynchronous:

The caller receives ack quickly
Minimizes cost of waiting
Queening separates fast and slow processes
Process change is easy
Passes transaction ids

Lambda Performance →Summary

Cold Starts:

Cause of cold start
VPC improvements
Provisioned concurrency

Memory and Profiling:

Memory is the power of lambda
AWS Lambda power tuning
Trade-off cost and speed

Architecture and Optimization:

Best Practices
RDS Proxy
Async and Service Integration

Stackademic

Thank you for reading until the end. Before you go:

Please consider clapping and following the writer! 👏
Follow us on Twitter(X), LinkedIn, and YouTube.
Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.

Stackademic

AWS Lambda Performance and Cost Optimization

Following are key factors affecting the performance of serverless applications.

Anatomy of a Lambda function:

Function life cycle-worker host:

Full Cold Start: Steps performed when lambda runs the first time

Measuring performance of Lambda Function:

Cold Starts:

Cold start causes:

Code Optimization:

Provisioned Concurrency:

Function life cycle: Provisioned Concurrency Start

Provisioned Concurrency: Things to know

Things to know on the AWS side for provisioned concurrency:

Provisioned Concurrency: Application Auto Scaling:

Memory Usage and Profiling:

Architecture and best practices:

Optimization Best Practices:

Lambda Performance →Summary

Cold Starts:

Stackademic

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Stackademic

Written by Ram Vadranam

Responses (1)