2022-09-13

Provisioned Concurrency in AWS Lambda

Provisioned Concurrency in AWS Lambda

Provisioned Concurrency is a feature in AWS Lambda that allows you to pre-warm a specific number of instances of your Lambda function, reducing the latency associated with cold starts. Cold starts occur when a Lambda function is first initialized or when there is a sudden spike in requests that exceeds the number of available instances. With Provisioned Concurrency, you can maintain a pool of pre-warmed instances to minimize cold start latency and ensure a consistent and fast response time.

The main benefits of using Provisioned Concurrency in AWS Lambda are:

  • Reduced Latency
    By pre-warming Lambda function instances, the time it takes for a request to be served is significantly reduced, resulting in a faster and more consistent user experience.

  • Predictable Performance
    Provisioned Concurrency ensures that your application has a constant pool of resources to handle requests, eliminating the need to scale up on demand and providing a predictable performance level.

  • Better Control
    You can configure the number of pre-warmed instances based on your application's requirements, providing you with better control over your application's performance and cost.

When to Use Provisioned Concurrency

Provisioned Concurrency is most beneficial for applications that require low-latency responses and have a predictable load pattern. Use cases include:

  • Interactive Applications
    Applications like web or mobile apps, where users expect a snappy and consistent response time, can greatly benefit from Provisioned Concurrency.

  • Scheduled JobsIf you have Lambda functions running as scheduled jobs or cron tasks, you can use Provisioned Concurrency to ensure that your function starts executing immediately when triggered.

  • High-Throughput Services
    For services that need to process a high volume of requests with low latency, Provisioned Concurrency can help maintain a steady performance level.

However, it's important to note that Provisioned Concurrency might not be suitable for all scenarios, especially if your application has highly variable or unpredictable workloads, as it could lead to increased costs.

Understanding the Pricing Model

When using Provisioned Concurrency, you are billed based on the number of pre-warmed instances you configure and the duration for which they are provisioned. The cost is calculated using the following components:

  • Provisioned Concurrency
    You are charged for the number of pre-warmed instances you have configured, regardless of whether they are used or not.

  • Duration
    You are billed for the total duration of your provisioned instances, measured in GB-seconds. This is calculated as the product of the memory size of your function, the number of provisioned instances, and the duration for which they are provisioned.

  • Requests
    You are also billed for the number of requests served by your Lambda function, including both the provisioned and on-demand instances.

It is crucial to carefully consider the pricing implications before implementing Provisioned Concurrency. To optimize costs, you should analyze your application's load patterns and configure the number of pre-warmed instances accordingly. Keep in mind that having too many unused instances can lead to increased costs, while having too few instances can result in cold starts and increased latency.

Example Monthly Charge

Let's assume you have a Lambda function with a memory size of 1.5 GB and you've configured 20 instances of Provisioned Concurrency. The function is provisioned for an entire month (30 days), and during that time, it serves 25 million requests.

In this example, we'll use the pricing for the US East (N. Virginia) region, which is $0.015 per GB-hour for Provisioned Concurrency and $0.20 per 1 million requests.

  • Provisioned Concurrency cost: To calculate the total cost for Provisioned Concurrency, we first need to determine the GB-hours. Since each instance has 1.5 GB of memory and there are 20 instances provisioned for 30 days, the calculation is:
    • GB-hours = 1.5 GB * 20 instances * 24 hours/day * 30 days = 21600 GB-hours
    • Provisioned Concurrency cost = 21600 GB-hours * $0.015/GB-hour = $324
  • Request cost: The cost for 25 million requests is:
    • Requests cost = 25 million requests * ($0.20/1 million requests) = $5

The total monthly charge for this example would be $329 (Provisioned Concurrency cost + Requests cost).

Configuring Provisioned Concurrency

Setting Up Provisioned Concurrency in the AWS Console

To configure Provisioned Concurrency for a Lambda function using the AWS Management Console, follow these steps:

  1. Navigate to the AWS Management Console and sign in to your account.
  2. Open the AWS Lambda service by searching for Lambda in the search bar or selecting it from the Services menu.
  3. Locate the desired Lambda function in the list and click on its name to open the function configuration page.
  4. In the Function Configuration panel, select the Versions tab.
  5. Choose the version or alias of your Lambda function for which you want to enable Provisioned Concurrency.
  6. Scroll down to the Provisioned Concurrency section and click Edit.
  7. Enter the desired number of provisioned instances in the Provisioned Concurrency input field.
  8. Click Save to apply the changes.

Once the configuration is saved, AWS will begin pre-warming the specified number of instances for your Lambda function.

Using AWS CLI and SDKs

You can also configure Provisioned Concurrency using the AWS Command Line Interface (CLI) or SDKs. To set Provisioned Concurrency using the AWS CLI, run the following command:

bash
$ aws lambda put-provisioned-concurrency-config \
  --function-name <FUNCTION_NAME> \
  --qualifier <VERSION_OR_ALIAS> \
  --provisioned-concurrent-executions <NUMBER_OF_INSTANCES>

Replace <FUNCTION_NAME> with the name of your Lambda function, <VERSION_OR_ALIAS> with the desired version or alias, and <NUMBER_OF_INSTANCES> with the number of provisioned instances you want to configure.

For configuring Provisioned Concurrency using SDKs, refer to the relevant documentation for the SDK in your preferred programming language.

Managing Provisioned Concurrency with Infrastructure as Code

Infrastructure as Code (IaC) tools like AWS CloudFormation and Terraform enable you to manage and automate the configuration of your AWS resources, including Provisioned Concurrency for Lambda functions. Here's an example of how to configure Provisioned Concurrency using AWS CloudFormation:

yml
Resources:
  MyLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      # Function properties...

  MyLambdaFunctionProvisionedConcurrency:
    Type: AWS::Lambda::ProvisionedConcurrencyConfig
    Properties:
      FunctionName: !Ref MyLambdaFunction
      Qualifier: <VERSION_OR_ALIAS>
      ProvisionedConcurrentExecutions: <NUMBER_OF_INSTANCES>

Replace <VERSION_OR_ALIAS> with the desired version or alias, and <NUMBER_OF_INSTANCES> with the number of provisioned instances you want to configure.

For Terraform, you can use the aws_lambda_provisioned_concurrency_config resource to configure Provisioned Concurrency:

tf
resource "aws_lambda_function" "my_lambda_function" {
  # Function properties...
}

resource "aws_lambda_provisioned_concurrency_config" "my_lambda_function_provisioned_concurrency" {
  function_name                     = aws_lambda_function.my_lambda_function.function_name
  provisioned_concurrent_executions = <NUMBER_OF_INSTANCES>
  qualifier                         = <VERSION_OR_ALIAS>
}

Replace <VERSION_OR_ALIAS> with the desired version or alias, and <NUMBER_OF_INSTANCES> with the number of provisioned instances you want to configure.

Comparing the Effect of Global Scope

In AWS Lambda, global scope refers to the variables, objects, or resources that are defined outside the Lambda function handler. Global scope variables are initialized when the Lambda function container is created and are reused across multiple invocations of the same container. The global scope can impact the performance of your Lambda function, especially when using Provisioned Concurrency.

Let's compare the effects of global scope with an example:

python
import boto3

# Global scope variable
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("MyTable")

def lambda_handler(event, context):
    # Function scope variable
    s3 = boto3.client("s3")

    # Function logic...

In this example, the dynamodb resource and table variable are in the global scope, while the s3 client is in the function scope. When using Provisioned Concurrency, the global scope variables are initialized only once when the container is created. This means that the DynamoDB resource and table are available for all invocations of the same container, reducing the initialization overhead for subsequent invocations.

On the other hand, the s3 client is initialized for each invocation, which may lead to increased latency. However, this may be negligible for infrequent or low-throughput use cases.

Static Initialization and Provisioned Concurrency

Static initialization refers to the process of initializing resources, modules, or data during the startup phase of the Lambda function container. This can have a significant impact on the cold start latency, especially for complex or resource-intensive applications.

When using Provisioned Concurrency, the static initialization occurs when the pre-warmed instances are created. This allows the Lambda function to start with the resources, modules, and data already initialized, reducing the latency for each invocation.

Let's consider an example where we use static initialization to load a machine learning model:

python
import json
import boto3
import numpy as np
from sklearn.externals import joblib

# Static initialization
s3 = boto3.client("s3")
s3.download_file("my-bucket", "models/model.pkl", "/tmp/model.pkl")
model = joblib.load("/tmp/model.pkl")

def lambda_handler(event, context):
    # Function logic
    input_data = np.array(json.loads(event["body"])["input"])
    prediction = model.predict(input_data)

    return {
        "statusCode": 200,
        "body": json.dumps({"prediction": prediction.tolist()})
    }

In this example, the machine learning model is downloaded and loaded during the static initialization phase. When using Provisioned Concurrency, the model is already loaded and available for all invocations, leading to faster response times.

Invoking Lambda Functions with Versions or Aliases for Provisioned Concurrency

To get the benefits of Provisioned Concurrency, you need to invoke the specific version or alias of the Lambda function for which you have configured the pre-warmed instances. This chapter explains how to invoke a Lambda function using a version or alias to utilize the provisioned concurrency.

Invoking Lambda Functions with Versions

Lambda function versions are immutable snapshots of your function code and configuration. To invoke a specific version of a Lambda function, you need to append the version number to the function name using the :version_number syntax.

Example of invoking a specific version using the AWS CLI:

bash
$ aws lambda invoke --function-name <FUNCTION_NAME>:<VERSION_NUMBER> --payload '{"key": "value"}' response.json

Replace <FUNCTION_NAME> with the name of your Lambda function and <VERSION_NUMBER> with the version number for which you have configured the provisioned concurrency.

Invoking Lambda Functions with Aliases

An alias is a named reference to a specific function version. Using aliases can simplify your function invocation and deployment process. You can create an alias for the version of your Lambda function with provisioned concurrency and then update the alias when you want to switch to a new version.

Example of invoking a Lambda function using an alias with the AWS CLI:

bash
$ aws lambda invoke --function-name <FUNCTION_NAME>:<ALIAS_NAME> --payload '{"key": "value"}' response.json

Replace <FUNCTION_NAME> with the name of your Lambda function and <ALIAS_NAME> with the alias name for which you have configured the provisioned concurrency.

References

https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html
https://docs.aws.amazon.com/lambda/latest/operatorguide/global-scope.html
https://www.youtube.com/watch?v=7Bc97tAySkU&ab_channel=CloudWithRaj

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!