I am trying to setup some build and deployment servers based on EC2 instances to deploy software to AWS via CloudFormation.
The current setup uses the AWS CLI to deploy CloudFormation templates, and authentication is handled using a credentials profile where the ~/.aws/config file has a profile with:
[profile x]
role_arn = x
credential_source = Ec2InstanceMetadata
region = x
The setup using the AWS CLI appears to be working fine, and can deploy CloudFormation templates, upload files to S3 etc.
I wanted to automate this further and use a configuration-based approach to allow for more flexibility in our deployments. To achieve this, I have written some Python code to parse a config file and use the Boto3 library (which the AWS CLI also uses) to replicate the functionality. However when I am trying to do similar things in Boto3 (like deploy CloudFormation and upload files to S3), I get the following error: Connection to sts.amazonaws.com timed out. Unfortunately I can't provide the full stack trace since it's on a separate network. I am running Python 3.7 and boto3-1.21-13, botocore-1.24.13.
I assume it might be because I need to setup a VPC endpoint for STS? However, I can't work out why and how the AWS CLI works fine, but Boto3 doesn't. Especially since AWS CLI uses Boto3 under the hood.
In addition, I have confirmed that I can retrieve instance metadata using curl from the EC2 instances.
To reproduce the error, this command fails for me:
python -c "import boto3;print(boto3.Session(profile_name='x').client('s3').list_objects('bucket')"
However this AWS cli command works:
aws --profile x s3 ls bucket
I guess I don't understand why the AWS CLI command works, when the boto3 command fails. Why does boto3 needs to call the sts.amazonaws.com endpoint, when the AWS CLI seemingly doesn't? What am I missing?
The aws cli and boto3 both use botocore, which is only a minor detail. Nevertheless, both the cli and boto3, when run in the same environment with the same access to the credentials, should indeed be able to reach the same endpoint.
This:
aws sts get-caller-identity --profile x
and:
python -c "import boto3;print(boto3.Session(profile_name='x').client('sts').get_caller_identity())"
are equivalent and should make the same api calls to the same endpoint.
As an aside, I find it is often best not to have your code concerned with session handling at all. It seems most simple to me for the code to expect the environment to handle that. So just export AWS_PROFILE and run the code. This prevents other user of the script from having to have the same profile and name it the same.
Yeah so it turns out I just needed to set/export AWS_STS_REGIONAL_ENDPOINTS='regional'.
After many hours of trawling the botocore and awscli source and logs, I found out that botocore sets it by default to 'legacy'.
Where as in v2 of the AWS CLI, they set it to 'regional'.
I created an EMR cluster using the AWS UI but did not bootstrap to install boto3. Now, I am getting to execute pyspark scripts which use boto3. So, I SSH to the master node and am attempting to install boto3. Below is screen shot I gathered to show as if I am installing it again (so the messages are saying that it is already installed), but you get the point. Then I run Python3 interactively to test boto3 and it can't find it. What am I doing wrong? Also, will I need to install boto3 on the slave nodes as well?
Thanks
You install boto3 for Python2, so you don't see it from Python3.
Try to install boto3 with pip3:
pip3 install boto3 --user
I am facing issues with boto3(1.9.42) readily comes with AWS which fails when I run certain commands. But same issue is not faced when I run the script locally because boto3 installed locally is 1.9.118 version where as aws comes with 1.9.42.
client_api = boto3.client(service_name='apigatewaymanagementapi', endpoint_url=endpoint_url)
[ERROR] UnknownServiceError: Unknown service: 'apigatewaymanagementapi
I know there is no direct way to replace boto3 in aws. Is there any way I can deploy the local boto3 to aws and use the module from aws lambda functions.
I upload my lambda function sources from AWS codebuild. My Python script uses NLTK so it needs a lot of data. My .zip package is too big and an RequestEntityTooLargeException occurs. I want to know how to increase the size of the deployment package sent via the UpdateFunctionCode command.
I use AWS CodeBuild to transform the source from a GitHub repository to AWS Lambda. Here is the associated buildspec file:
version: 0.2
phases:
install:
commands:
- echo "install step"
- apt-get update
- apt-get install zip -y
- apt-get install python3-pip -y
- pip install --upgrade pip
- pip install --upgrade awscli
# Define directories
- export HOME_DIR=`pwd`
- export NLTK_DATA=$HOME_DIR/nltk_data
pre_build:
commands:
- echo "pre_build step"
- cd $HOME_DIR
- virtualenv venv
- . venv/bin/activate
# Install modules
- pip install -U requests
# NLTK download
- pip install -U nltk
- python -m nltk.downloader -d $NLTK_DATA wordnet stopwords punkt
- pip freeze > requirements.txt
build:
commands:
- echo 'build step'
- cd $HOME_DIR
- mv $VIRTUAL_ENV/lib/python3.6/site-packages/* .
- sudo zip -r9 algo.zip .
- aws s3 cp --recursive --acl public-read ./ s3://hilightalgo/
- aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip
- aws lambda update-function-configuration --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --environment 'Variables={NLTK_DATA=/var/task/nltk_data}'
post_build:
commands:
- echo "post_build step"
When I launch the pipeline, I have RequestEntityTooLargeException because there are too many data in my .zip package. See the build logs below:
[Container] 2019/02/11 10:48:35 Running command aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip
An error occurred (RequestEntityTooLargeException) when calling the UpdateFunctionCode operation: Request must be smaller than 69905067 bytes for the UpdateFunctionCode operation
[Container] 2019/02/11 10:48:37 Command did not exit successfully aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip exit status 255
[Container] 2019/02/11 10:48:37 Phase complete: BUILD Success: false
[Container] 2019/02/11 10:48:37 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: aws lambda update-function-code --function-name arn:aws:lambda:eu-west-3:671560023774:function:LaunchHilight --zip-file fileb://algo.zip. Reason: exit status 255
Everything works correctly when I reduce the NLTK data to download (I tried with only the packages stopwords and wordnet.
Does anyone have an idea to solve this "size limit problem"?
You cannot increase the deployment package size for Lambda. AWS Lambda limits are described in AWS Lambda developer guide. More information on how those limits work can be seen here. In essence, your unzipped package size has to be less than 250MB (262144000 bytes).
PS: Using layers doesn't solve sizing problem, though helps with management & maybe faster cold start. Package size includes the layers - Lambda layers.
A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250 MB.
Update Dec 2020 : As per AWS blog, as pointed by user jonnocraig in this answer, you can overcome these restrictions if you build a container for your application & run it on Lambda.
If anyone stumbles across this issue post December 2020, there's been a major update from AWS to support Lambda functions as container images (up to 10GB!!). More info here
AWS Lambda functions can mount EFS. You can load libraries or packages that are larger than the 250 MB package deployment size limit of AWS Lambda using EFS.
Detailed steps on how to set it up are here:
https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/
On a high level, the changes include:
Create and setup EFS file system
Use EFS with lambda function
Install the pip dependencies inside EFS access point
Set the PYTHONPATH environment variable to tell where to look for the dependencies
The following are hard limits for Lambda (may change in future):
3 MB for in-console editing
50 MB zipped as package for upload
250 MB when unzipped including layers
A sensible way to get around this is to mount EFS from your Lambda. This can be useful not only for loading libraries, but also for other storage.
Have a look through these blogs:
https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/
https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/
I have not tried this myself, but the folks at Zappa describe a trick that might help. Quoting from https://blog.zappa.io/posts/slim-handler:
Zappa zips up the large application and sends the project zip file up to S3. Second, Zappa creates a very minimal slim handler that just contains Zappa and its dependencies and sends that to Lambda.
When the slim handler is called on a cold start, it downloads the large project zip from S3 and unzips it in Lambda’s shared /tmp space. All subsequent calls to that warm Lambda share the /tmp space and have access to the project files; so it is possible for the file to only download once if the Lambda stays warm.
This way you should get 500MB in /tmp.
Update:
I have used the following code in the lambdas of a couple of projects, it is based on the method zappa used, but can be used directly.
# Based on the code in https://github.com/Miserlou/Zappa/blob/master/zappa/handler.py
# We need to load the layer from an s3 bucket into tmp, bypassing the normal
# AWS layer mechanism, since it is too large, AWS unzipped lambda function size
# including layers is 250MB.
def load_remote_project_archive(remote_bucket, remote_file, layer_name):
# Puts the project files from S3 in /tmp and adds to path
project_folder = '/tmp/{0!s}'.format(layer_name)
if not os.path.isdir(project_folder):
# The project folder doesn't exist in this cold lambda, get it from S3
boto_session = boto3.Session()
# Download zip file from S3
s3 = boto_session.resource('s3')
archive_on_s3 = s3.Object(remote_bucket, remote_file).get()
# unzip from stream
with io.BytesIO(archive_on_s3["Body"].read()) as zf:
# rewind the file
zf.seek(0)
# Read the file as a zipfile and process the members
with zipfile.ZipFile(zf, mode='r') as zipf:
zipf.extractall(project_folder)
# Add to project path
sys.path.insert(0, project_folder)
return True
This can then be called as follows (I pass the bucket with the layer to the lambda function via an env variable):
load_remote_project_archive(os.environ['MY_ADDITIONAL_LAYERS_BUCKET'], 'lambda_my_extra_layer.zip', 'lambda_my_extra_layer')
At the time when I wrote this code, tmp was also capped, I think to 250MB, but the call to zipf.extractall(project_folder) above can be replaced with extracting directly to memory: unzipped_in_memory = {name: zipf.read(name) for name in zipf.namelist()}
which I did for some machine learning models, I guess the answer of #rahul is more versatile for this though.
From the AWS documentation:
If your deployment package is larger than 50 MB, we recommend
uploading your function code and dependencies to an Amazon S3 bucket.
You can create a deployment package and upload the .zip file to your
Amazon S3 bucket in the AWS Region where you want to create a Lambda
function. When you create your Lambda function, specify the S3 bucket
name and object key name on the Lambda console, or using the AWS
Command Line Interface (AWS CLI).
You can use the AWS CLI to deploy the package, and instead of using the --zip-file argument to pass the deployment package, you can specify the object in the S3 bucket with the --code parameter. Ex:
aws lambda create-function --function-name my_function --code S3Bucket=my_bucket,S3Key=my_file
This aws wrangler zip file from github (https://github.com/awslabs/aws-data-wrangler/releases) includes many other libraries like pandas and pymysql. In my case it was the only layer I needed since it has so much other stuff. Might work for some people.
You can try the workaround used in the awesome serverless-python-requirements plugin.
Ideal solution is to use lambda layers if it solves the purpose. If the total dependency is greater than 250MB then you can sideload lesser used dependencies from S3 bucket during run time by utilizing the 512 MB provided in /tmp directory. The zipped dependencies are stored in S3 and lambda can fetch the files from S3 during initialisation. Unzip the dependecy pacakge and add the path to sys path.
Please note that the python dependencies need to be built on the Amazon Linux, which is the operating system for lambda containers. I used a EC2 instance to create the zip package.
You check the code used in serverless-python-requirements here
Before 2021, the best way was to deploy the jar file to S3, and create AWS lambda with it.
From 2021, AWS Lambda begin to support container image. Read here : https://aws.amazon.com/de/blogs/aws/new-for-aws-lambda-container-image-support/
So from now on, you should probably consider package and deploy your Lambda functions as container images(up to 10 GB).
The tips to use large lambda project into AWS is to use a docker image store in the AWS ECR service instead of a ZIP file. You can use a docker image up to 10GO.
The AWS documentation provide an example to help you here :
Create an image from an AWS base image for Lambda
May be late to the party but you can use a Docker Image to get around the lambda layer constraint. This can be done using serverless stack development or just through the console.
You cannot increase the package size, but you can use AWS Lambda layers to store some application dependencies.
https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html#configuration-layers-path
Before this layers a common used pattern to workaround this limitation was to download huge dependencies from S3.
I'm trying to use this very simple command:
import boto3
client = boto3.client('sagemaker-runtime')
listed in the documentation
but i'm getting this error:
UnknownServiceError: Unknown service: 'sagemaker-runtime'. Valid service names are: acm, etc..
My goal is to be able to invoke the endpoint that I've created in Amazon SageMaker.
I'm doing this from a Jupyter notebook in Sagemaker, so I feel like this should work no problem. How do I get it to run here, and outside of the Sagemaker environment?
Amazon SageMaker is a very new service (December 2017).
You will need to update your boto library to use it:
sudo pip install boto --upgrade
sudo pip install boto3 --upgrade
sudo pip install awscli --upgrade
The documentation is incorrect. This is how you get the client with the SageMaker Python SDK.
import boto3
client = boto3.client('runtime.sagemaker')
I've done this successfully. And, as John said, be sure to update your versions of boto3 and awscli.
In case you use Jupyter Notebook, create cell at the top and execute below.
!pip install boto3 --upgrade
In my case, to upgrade boto3 in terminal didn't work.