I am creating a DeepLens project to recognise people, when one of select group of people are scanned by the camera.
The project uses a lambda, which processes the images and triggers the 'rekognition' aws api.
When I trigger the API from my local machine - I get a good response
When I trigger the API from AWS console - I get failed response
Problem
After much digging, I found that the 'boto3' (AWS python library) is of version:
1.9.62 - on my local machine
1.8.9 - on AWS console
Question
Can I upgrade the 'boto3' library version on the AWS lambda console ?? If so, how ?
If you don't want to package a more recent boto3 version with you function, you can download boto3 with each invocation of the Lambda. Remember that /tmp/ is the directory that Lambda will allow you to download to, so you can use this to temporarily download boto3:
import sys
from pip._internal import main
main(['install', '-I', '-q', 'boto3', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check'])
sys.path.insert(0,'/tmp/')
import boto3
from botocore.exceptions import ClientError
def handler(event, context):
print(boto3.__version__)
You can achieve the same with either Python function with dependencies or with a Virtual Environment.
These are the available options other than that you also try to contact Amazon team if they can help you with up-gradation.
I know, you're asking for a solution through Console, but this is not possible (as of my knowledge).
To solve this you need to provide the boto3 version you require to your lambda (either with the solution from user1998671 or with what Shivang Agarwal is proposing). A third solution is to provide the required boto3 version as a layer for the lambda. The big advantage of the layer is that you can re-use it for all your lambdas.
This can be achieved by following the guide from AWS (the following is mainly copied from the linked guide from AWS):
IMPORTANT: Make sure to adjust boto3-mylayer with a for you suitable name.
Create a lib folder by running the following command:
LIB_DIR=boto3-mylayer/python
mkdir -p $LIB_DIR
Install the library to LIB_DIR by running the following command:
pip3 install boto3 -t $LIB_DIR
Zip all the dependencies to /tmp/boto3-mylayer.zip by running the following command:
cd boto3-mylayer
zip -r /tmp/boto3-mylayer.zip .
Publish the layer by running the following command:
aws lambda publish-layer-version --layer-name boto3-mylayer --zip-file fileb:///tmp/boto3-mylayer.zip
The command returns the new layer's Amazon Resource Name (ARN), similar to the following one:
arn:aws:lambda:region:$ACC_ID:layer:boto3-mylayer:1
To attach this layer to your lambda execute the following:
aws lambda update-function-configuration --function-name <name-of-your-lambda> --layers <layer ARN>
To verify the boto version in your lambda you can simply add the following two print commands in your lambda:
print(boto3.__version__)
print(botocore.__version__)
Related
I am trying to use Cloudformation package to include the glue script and extra python files from the repo to be uploaded to s3 during the package step.
For the glue script it's straightforward where I can use
Properties:
Command:
Name: pythonshell #glueetl -spark # pythonshell -python shell...
PythonVersion: 3
ScriptLocation: "../glue/test.py"
But how would I be able to do the same for extra python files? The following does not work, it seems that I could upload the file using the Include Transform but not sure how to reference it back in extra-py-files?
DefaultArguments:
"--extra-py-files":
- "../glue/test2.py"
Sadly, you can't do this. package only supports for glue:
Command.ScriptLocation property for the AWS::Glue::Job resource
Packaging DefaultArguments arguments is not supported. This means that you have to do it "manually" (e.g. create bash script) outside of CloudFormation.
Trying to run AWS Glue Python Shell Job but gives me Connect Timeout Error
Error Image : https://i.stack.imgur.com/MHpHg.png
Script : https://i.stack.imgur.com/KQxkj.png
It looks like you didn't added secretsmanager endpoint to your VPC. As the traffic will not leave AWS network there will not be internet access inside your Glue job's VPC. So if you want to connect to secretsmanager then you need to add it to your VPC.
Refer to this on how you can add this to your VPC and this to make sure you have properly configured security groups.
AWS Glue Git Issue
Hi,
We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3
AWS Glue Python Shell with Internet
Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.
Download the following whl files
awscli-1.18.183-py2.py3-none-any.whl
boto3-1.16.23-py2.py3-none-any.whl
Upload the files to s3 bucket in your given python library path
Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma
AWS Glue Python Shell without Internet connectivity
Reference: AWS Wrangler Glue dependency build
We followed the steps mentioned above for awscli and boto3 whl files
Below is the latest requirements.txt compiled for the newest versions
colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0
Download the dependencies to libs folder
pip download -r requirements.txt -d libs
Move the original main whl files also to the lib directory
awscli-1.18.183-py2.py3-none-any.whl
boto3-1.16.23-py2.py3-none-any.whl
Package as a zip file
cd libs zip ../boto3-depends.zip *
Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path
Note: It is Referenced files path and not Python library path
Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell.
import os.path
import subprocess
import sys
# borrowed from https://stackoverflow.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))
zip_file = get_referenced_filepath("awswrangler-depends.zip")
subprocess.run()
# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(, shell=True)
#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys =
for k in keys:
if 'boto' in k:
del sys.modules[k]
import boto3
print('boto3 version')
print(boto3.__version__)
Check if the code is working with latest AWS CLI API
Thanks
Sarath
Following the sample code provided in the boto3 documentation for using the workmailmessageflow service
import boto3
client = boto3.client('workmailmessageflow')
triggers and UnknownServiceError: "Unknown service: 'workmailmessageflow'. Valid service names are..."
The boto3 version reported by AWS Lambda Python 3.7 is 1.9.221. Any thoughts on why workmailmessageflow is not recognized?
Ref: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/workmailmessageflow.html
Because workmailmessageflow first appeared in Boto3 1.9.228.
You have to use a custom version of Boto3, either use a layer or package the latest version together with your code.
I have pyspark script which I can run in AWS GLUE. But everytime I am creating job from UI and copying my code to the job .Is there anyway I can automatically create job from my file in s3 bucket. (I have all the library and glue context which will be used while running )
Another alternative is to use AWS CloudFormation. You can define all AWS resources you want to create (not only Glue jobs) in a template file and then update stack whenever you need from AWS Console or using cli.
Template for a Glue job would look like this:
MyJob:
Type: AWS::Glue::Job
Properties:
Command:
Name: glueetl
ScriptLocation: "s3://aws-glue-scripts//your-script-file.py"
DefaultArguments:
"--job-bookmark-option": "job-bookmark-enable"
ExecutionProperty:
MaxConcurrentRuns: 2
MaxRetries: 0
Name: cf-job1
Role: !Ref MyJobRole # reference to a Role resource which is not presented here
I created an open source library called datajob to deploy and orchestrate glue jobs. You can find it on github https://github.com/vincentclaes/datajob and on pypi
pip install datajob
npm install -g aws-cdk#1.87.1
you create a file datajob_stack.py that describes your glue jobs and how they are orchestrated:
from datajob.datajob_stack import DataJobStack
from datajob.glue.glue_job import GlueJob
from datajob.stepfunctions.stepfunctions_workflow import StepfunctionsWorkflow
with DataJobStack(stack_name="data-pipeline-simple") as datajob_stack:
# here we define 3 glue jobs with a relative path to the source code.
task1 = GlueJob(
datajob_stack=datajob_stack,
name="task1",
job_path="data_pipeline_simple/task1.py",
)
task2 = GlueJob(
datajob_stack=datajob_stack,
name="task2",
job_path="data_pipeline_simple/task2.py",
)
task3 = GlueJob(
datajob_stack=datajob_stack,
name="task3",
job_path="data_pipeline_simple/task3.py",
)
# we instantiate a step functions workflow and add the sources
# we want to orchestrate.
with StepfunctionsWorkflow(
datajob_stack=datajob_stack, name="data-pipeline-simple"
) as sfn:
[task1, task2] >> task3
To deploy your code to glue execute:
export AWS_PROFILE=my-profile
datajob deploy --config datajob_stack.py
any feedback is much appreciated!
Yes, it is possible. For instance, you can use boto3 framework for this purpose.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
I wrote script which does following:
We have (glue)_dependency.txt file, script gets path of all dependency files and create zip file.
It uploads glue file and zip file in S3 by using s3 sync
Optionally, if any change in job setting will re-deploy cloudformation template
You may write shell script to do it.
I'd like to use the Boto3 put_bucket_encryption inside a lambda function, but the current Lambda execution enviornment is at botocore version 1.7.37, and the put_bucket_encryption was introduced in botocore 1.7.41.
So I'd like to package up my local version of boto3/botocore.
I've included pip packages in lambda functions using serverless framework, along with serverless-python-requirements, but it doesn't seem to work for boto3/botocore.
The function responds to a CreateBucket event and tries to put_bucket_encryption, but fails with
'S3' object has no attribute 'put_bucket_encryption': AttributeError
How can I force my lambda function to use a more up to date botocore?
Was able to resolve with kichik's help
What I missed was the section about omitting packages in the serverless-python-requirements docs. Specifically:
By default, this will not install the AWS SDKs that are already installed on Lambda.
So in my serverless.yml I added
custom:
pythonRequirements:
noDeploy:
- pytest
Once I deployed, it was using my packaged versions of boto3/botocore