Stackademic

Stackademic is a learning hub for programmers, devs, coders, and engineers. Our goal is to democratize free coding education for the world.

Follow publication

AWS Lambda Performance and Cost Optimization

Ram Vadranam
Stackademic
Published in
7 min readNov 15, 2021

--

  1. Cold starts
  2. Memory and profiling
  3. Architecture and best practices

How does lambda work under the hood?

AWS Lambda function powered by Graviton2 provides up to 34% better performance and 20% lower cost over x86 based lambda.

AWS Graviton2 processor which contains 64-bit Arm Neoverse cores optimized for cloud-native applications.

Lambda function can be deployed with container image or .zip file to run on x86 or ARM-based processor.

Points to Consider before migrating an existing application from x86 to ARM processor.

  1. Interpreted and compiled byte code languages can run without modifications.
  2. Complied languages need to be recompiled for arm64
  3. Lambda container images need to be rebuilt for Arm
  4. AWS tools and SDKs support Graviton2 transparently.

Anatomy of a Lambda function:

  1. Handler() function →Function to be run upon invocation
  2. Event Object → Data sent during lambda function invocation
  3. Context Object → Methods are available to interact with the runtime and execution environment.

Function life cycle-worker host:

Full Cold Start: Steps performed when lambda runs the first time

  1. Downloads Code
  2. Starts Execution Environment
  3. Intensate Run Time
  4. Run Handler Code

Warm Start: Second lambda request received while the first request in progress

  1. Run Handler code

AWS Lambda function contains two types of optimizations.

  1. AWS Optimization → Downloads Code, Starts Execution Environment
  2. Developer Optimization → Intensate Run Time, Run Handler Code

Measuring performance of Lambda Function:

AWS X-ray will help to measure the performance of the Lambda Functions.

Type: AWS::Serverless::Function
Properties:
FunctionName: !Ref LambdaFunctionName
CodeUri:
Bucket: !Ref BucketName
Key: !Ref LambdaZipName
Handler: request_handler.lambda_handler
Runtime: python3.8
MemorySize: 320
Tracing: Active

Using SDK at the code level will help to add custom annotations and metadata

from aws_xray_sdk.core import xray_recorder@xray_recorder.capture("upload_to_s3_bucket:")
def upload_to_s3_bucket(data_feed: str):
pass

Three areas of performance:

  1. Latency
  2. Throughput
  3. Cost

Cold Starts:

Function life Cycle

Cold Start → Execution Environment

Facts:

  1. Effects < 1% prod env
  2. Varies from < 100ms to >1s

Considerations:

  1. Pinging functions to keep warm is limited
  2. Targeting warm environments are difficult

Cold start causes:

  1. Environments reaped
  2. Failure in underlying resources
  3. Rebalancing across AZs
  4. Updating code/config flushes
  5. Scaling up

Cold starts → Execution environments are influenced by the following factors from the AWS side.

  1. Memory allocation
  2. Size of function package
  3. How often is a function called
  4. Internal algorithms

Cold starts → Static initialization(Developer responsibility)

  1. Code run before handler
  2. Initialization of objects and connections
  3. New Execution environment running the first time
  4. Scaling up

Cold starts are influenced by the size of the function package, amount of code, and initialization work.

Developer responsibility for optimization of static cold starts.

Code Optimization:

  1. Trim SDKs
  2. Reuse connections
  3. Don’t load if not used
  4. Lazily load variables

Provisioned Concurrency:

If using a large library is mandatory use provisioned concurrency to avoid cold starts due to code size. Provision concurrency works by pre-creating the execution environment and running the INIT code.

Following are use cases for provisioned concurrency:

  1. Latency Sensitive and interactive workloads
  2. Improved consistency across the long tail of performance across P95, P99, and P100 levels
  3. Minimal change to code
  4. Integrated with AWS Autoscaling
  5. Adds a cost factor for per concurrency provisioned but lower duration cost per invocation. This could save money when heavily used

Function life cycle: Provisioned Concurrency Start

API Call to Lambda Function during provisioned concurrency start.

Provisioned Concurrency: Things to know

  1. Reduces start time to < 100 ms
  2. Can’t be configured for $LATEST. Use versions/aliases
  3. Provisioning ramp up of 500 per minute
  4. No changes to function handler code performance
  5. Requests above-provisioned concurrency follow on-demand lambda limits and behavior for cold-starts, bursting, and performance
  6. Overall account concurrency per region limit still applies
  7. Wider support for CloudFormation, Terraform, Serverless Framework ..etc

Things to know on the AWS side for provisioned concurrency:

  1. AWS provisions more than the requested limit
  2. Environment reap still applies
  3. There is Less CPU burst than on-demand during init

Provisioned Concurrency: Application Auto Scaling:

Autoscaling is used for scenarios in which the required capacity limit is not known clearly. It can be done using a min/max setting or alarm-based value.

ScalableTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 100
MinCapacity: 1
ResourceId: !Sub function:${logicalName}:Live
RoleARN: !Join
- ':'
- - 'arn:aws:iam:'
- !Ref 'AWS::AccountId'
- role/aws-service-role/lambda.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_LambdaConcurrency
ServiceNamespace: lambda
ScalableDimension: lambda:function:ProvisionedConcurrency
DependsOn: FunctionAliasLive

Memory Usage and Profiling:

Lambda CPU performance varied by memory allocation. An increase in memory will increase the CPU performance. If Lambda code is CPU and Network-intensive then allocating more memory will increase the performance of lambda dramatically.

CPU-bound function example:

Compute 1,000 times all primary ≤ 1M

128MB. 11.722 sec. $0.024628

256MB. 6.678 sec. $0.028035

512MB. 3.194 sec. $0.026830

1024MB. 1.465 sec $0.024638

Due to Lambda billing being on GB sec fine-tuning lambda to more memory is cheaper as overall execution time is less.

AWS Lambda Power Tuning: Open source tool helps to fine-tune lambda with memory vs performance vs cost.

Git repo: https://github.com/alexcasalboni/aws-lambda-power-tuning

Features:

  1. Data-driven cost and performance optimization
  2. Available from Serverless Repository
  3. Easy to integrate with CI/CD
  4. Compares two functions
{
"lambdaARN": "your-lambda-function-arn",
"powerValues": [128, 256, 512, 1024, 2048, 3008], or all
"num": 10,
"payload": "{}",
"parallelInvocation": true,
"strategy": "cost|speed|balanced",
"balancedWeight": 0.5 (0 refers spee strategy 1 refers cost strategy)

}
{
"lambdaARN": "your-lambda-function-arn",
"powerValues": [128, 256, 512, 1024, 2048, 3008], or all
"num": 10,
"payload": "{}",
"parallelInvocation": true,
"autoOptimize": true
"autoOptimizeAliases": "Live"
}

Using power tools we can compare performance vs cost vs speed of X86 and Arm/Graviton2 processors.

Architecture and best practices:

Optimization Best Practices:

  1. Avoid monolithic functions(Reduces deployment package size. Micro/Nano services)
  2. Minify/uglify production code
  3. Optimize dependencies
  4. Lazy initialization of shared libs/objects(Helps if multiple functions per file)

Optimized dependency usage(Node.js SDK & X-Ray):

//const AWS = require(‘aws-sdk’)const DynamoDB = require(‘aws-sdk/clients/dynamodb’) //125ms faster

X-ray usage

//const AWSXray = require(‘aws-xray-sdk’)const AWSXray = require(‘aws-xray-sdk-core’) //5ms faster

Lazy initialization example(Python and Boto3):

import boto3s3_client = Noneddb_client = Nonedef get_objects(event, context): if not s3_client:
s3_client = boto3.client(“s3”)
# business logicdef get_items(event, context):if not ddb_clinet:ddb_client = boto3.client(“dynamodb”)# business logic

Note: Not a great option if we are using provisioned concurrency

Amazon RDS Proxy:

Fully managed highly available database proxy for Amazon RDS. Pools and shares connections to make applications more scalable, more resilient to database failures, and more secure.

Optimization best practices(performance/cost):

  1. Externalize Orchestration → Avoid idle/sleep- Delegate to a step function
  2. Fine-tune across resources allocation → Don’t guess estimate function memory
  3. Transform, not transport → Minimize data transfer(S3 select, advanced filtering,)
  4. Lambda Destinations → Simplified chaining(async) and DLQ
  5. Discard uninterested events asap → Trigger config(S3 prefix, SNS filter)
  6. Keep in mind that retry policies → Very powerful but not free

Reusing Connection with Keep-Alive:

  1. For functions using HTTPS requests
  2. Use in SDK with environment variables
  3. Or Keep-Alive property in function code
  4. Can reduce typical Dynamodb operation from 30ms to 10ms
  5. Available in most run time SDKs

Service Integration:

Service integration is common in modern application development.

  1. Adding services increases latency mainly a synchronous concern. Example service invokes lambda and Lambda calls another service.
  2. Use Asynchronous rather than synchronous if possible
  3. Use VTL where appropriate
  4. Use Step Function Direct SDK feature
  5. Use Lambda to transform don’t transport data
  6. Avoid lambda calling lambda

Lambda Invocation Model:

Lambda has 3 types of invocation models

  1. Synchronous(request/response)
  2. Asynchronous(event)
  3. Event Source Mapping(stream/queue poller)

Comparing sync vs async lambda services:

Lambda Service A → Lambda Service B

Synchronous:

  1. Caller is waiting
  2. Waiting occurs cost
  3. Downstream slows down affects entries process
  4. Process change is complex
  5. Passes payload between steps

Lambda Service A → SQS → Lambda Service B

Asynchronous:

  1. The caller receives ack quickly
  2. Minimizes cost of waiting
  3. Queening separates fast and slow processes
  4. Process change is easy
  5. Passes transaction ids

Lambda Performance →Summary

Cold Starts:

  1. Cause of cold start
  2. VPC improvements
  3. Provisioned concurrency

Memory and Profiling:

  1. Memory is the power of lambda
  2. AWS Lambda power tuning
  3. Trade-off cost and speed

Architecture and Optimization:

  1. Best Practices
  2. RDS Proxy
  3. Async and Service Integration

Stackademic

Thank you for reading until the end. Before you go:

  • Please consider clapping and following the writer! 👏
  • Follow us on Twitter(X), LinkedIn, and YouTube.
  • Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.

--

--

Published in Stackademic

Stackademic is a learning hub for programmers, devs, coders, and engineers. Our goal is to democratize free coding education for the world.

Written by Ram Vadranam

Blending Technology and Innovation: Navigating the Cloud, Unraveling AI Mysteries, and Empowering Entrepreneurial Journeys

Responses (1)

Write a response