Deploy Pytidylib module in aws lambda by using Lamda Layers - amazon-web-services

I am trying to deploy pytidylib python module into AWS lambda function by using layers .
I have created the path as described in aws docs and created new layer.
Now the code of pytidylib needs some libraries from /usr/lib but i have installed libraries in /python/lib/python3.7/site-packages/ , so to resolve this i added the path in environ PATH of aws linux server platform , but still the issue is not resolved.
Below is my code :-
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
s3 = boto3.client("s3")
print(sys.platform)
ld_library_path = os.environ["LD_LIBRARY_PATH"]
print("old ld_library_path is ",ld_library_path)
ld_library_path = ld_library_path +
":/opt/python/lib/python3.7/site-packages/"
os.environ["LD_LIBRARY_PATH"] = ld_library_path
print("ld_library_path after set is
",os.environ["LD_LIBRARY_PATH"])
ld_library_path after set is /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib:/opt/python/lib/python3.7/site-packages/
I want to understand is there any way i can make this work through some changes in code and make the pytidylib module run through layers .
Below is the error:-
[ERROR] OSError: Could not load libtidy using any of these names:
libtidy,libtidy.so,libtidy-0.99.so.0,cygtidy-0-99-0,tidylib,libtidy.dylib,tidy
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 68, in lambda_handler
document, errors = tidy_document(doc)
File "/opt/python/lib/python3.7/site-packages/tidylib/tidy.py", line 222, in tidy_document
return get_module_tidy().tidy_document(text, options)
File "/opt/python/lib/python3.7/site-packages/tidylib/tidy.py", line 234, in get_module_tidy
_tidy = Tidy()
File "/opt/python/lib/python3.7/site-packages/tidylib/tidy.py", line 99, in __init__
+ ",".join(lib_names))

I tried to replicate your issue and for me Pytidylib layer works as expected.
This is the way I used to construct the layer, in case you want to give it a try. It involves a docker tool described in the recent AWS blog:
How do I create a Lambda layer using a simulated Lambda environment with Docker?
I created Pytidylib layer as follows:
Create empty folder, e.g. mylayer.
Go to the folder and create requirements.txt file with the content of
Pytidylib
Run the following docker command (may adjust python version to your needs):
docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"
Create layer as zip:
zip -r pytidylayer.zip python > /dev/null
Create lambda layer based on pytidylayer.zip in the AWS Console. Don't forget to specify Compatible runtimes to python3.8.
Add the layer to the lambda and test using the following lambda function:
import json
import tidylib
def lambda_handler(event, context):
print(dir(tidylib))
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
The function executed correctly:
['PersistentTidy', 'Tidy', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'release_tidy_doc', 'sink', 'tidy', 'tidy_document', 'tidy_fragment']

I solved this by adding path of the tidy library (libtidy.so.5.2.0) into the environment variable of LD_LIBRARY_PATH of linux server
For me library was pre installed in ubuntu 18.04 server in /usr/lib .Copy the library from this path ,place it inside the tidylib folder create a zip, and follow the steps of lambda-layers creation .

Related

Running Taurus BlazeMeter on AWS Lambda

I am trying to run a BlazeMeter Taurus script with a JMeter script inside via AWS Lambda. I'm hoping that there is a way to run bzt via a local installation in /tmp/bzt instead of looking for a bzt installation on the system which doesn't really exist since its lambda.
This is my lambda_handler.py:
import subprocess
import json
def run_taurus_test(event, context):
subprocess.call(['mkdir', '/tmp/bzt/'])
subprocess.call(['pip', 'install', '--target', '/tmp/bzt/', 'bzt'])
# subprocess.call('ls /tmp/bzt/bin'.split())
subprocess.call(['/tmp/bzt/bin/bzt', 'tests/taurus_test.yaml'])
return {
'statusCode': 200,
'body': json.dumps('Executing Taurus Test hopefully!')
}
The taurus_test.yaml runs as expected when testing on my computer with bzt installed via pip normally, so I know the issue isn't with the test script. The same traceback as below appears if I uninstall bzt from my system and try use a local installation targeted in a certain directory.
This is the traceback in the execution results:
Traceback (most recent call last):
File "/tmp/bzt/bin/bzt", line 5, in <module>
from bzt.cli import main
ModuleNotFoundError: No module named 'bzt'
It's technically failing in /tmp/bzt/bin/bzt which is the executable that's failing, and I think it is because it's not using the local/targeted installation.
So, I'm hoping there is a way to tell bzt to use keep using the targeted installation in /tmp/bzt instead of calling the executable there and then trying to pass it on to an installation that doesn't exist elsewhere. Feedback if AWS Fargate or EC2 would be better suited for this is also appreciated.
Depending on the size of the bzt package, the solutions are:
Use Lambda Docker recent feature, and this way, what you run locally is what you get on Lambda.
Use Lambda layers (similar to Docker), this layer as the btz module in the python directory as described there
When you package your Lambda, instead of uploading a simple Python file, create a ZIP file containing both: /path/to/zip_root/lambda_handler.py and pip install --target /path/to/zip_root

Errors when trying to call pycurl in a lambda on AWS

I want to use pycurl in order to have TTFB and TTLB, but am unable to call pycurl in an AWS lambda.
To focus on the issue, let say I call this simple lambda function:
import json
import pycurl
import certifi
def lambda_handler(event, context):
client_curl = pycurl.Curl()
client_curl.setopt(pycurl.CAINFO, certifi.where())
client_curl.setopt(pycurl.URL, "https://www.arolla.fr/blog/author/edouard-gomez-vaez/") #set url
client_curl.setopt(pycurl.FOLLOWLOCATION, 1)
client_curl.setopt(pycurl.WRITEFUNCTION, lambda x: None)
content = client_curl.perform()
dns_time = client_curl.getinfo(pycurl.NAMELOOKUP_TIME) #DNS time
conn_time = client_curl.getinfo(pycurl.CONNECT_TIME) #TCP/IP 3-way handshaking time
starttransfer_time = client_curl.getinfo(pycurl.STARTTRANSFER_TIME) #time-to-first-byte time
total_time = client_curl.getinfo(pycurl.TOTAL_TIME) #last requst time
client_curl.close()
data = json.dumps({'dns_time':dns_time,
'conn_time':conn_time,
'starttransfer_time':starttransfer_time,
'total_time':total_time,
})
return {
'statusCode': 200,
'body': data
}
I have the following error, which is understandable:
Unable to import module 'lambda_function': No module named 'pycurl'
I followed the tuto https://aws.amazon.com/fr/premiumsupport/knowledge-center/lambda-layer-simulated-docker/ in order to create a layer, but then have the following error while generated the layer with docker (I extracted the interesting part):
Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
I even tried to generate the layer just launching on my own machine:
pip install -r requirements.txt -t python/lib/python3.6/site-packages/
zip -r mypythonlibs.zip python > /dev/null
And then uploading the zip as a layer in aws, but I then have another error when lanching the lambda:
Unable to import module 'lambda_function': libssl.so.1.0.0: cannot open shared object file: No such file or directory
It seems that the layer has to be built on a somehow extended target environment.
After a couple of hours scratching my head, I managed to resolve this issue.
TL;DR: build the layer by using a docker image inherited from the aws one, but with the needed libraries, for instance libcurl-devel, openssl-devel, python36-devel. Have a look at the trick Note 3 :).
The detailed way:
Prerequisite: having Docker installed
In a empty directory, copy your requirements.txt containing pycurl (in my case: pycurl~=7.43.0.5)
In this same directory, create the following Dockerfile (cf Note 3):
FROM public.ecr.aws/sam/build-python3.6
RUN yum install libcurl-devel python36-devel -y
RUN yum install openssl-devel -y
ENV PYCURL_SSL_LIBRARY=openssl
RUN ln -s /usr/include /var/lang/include
Build the docker image:
docker build -t build-python3.6-pycurl .
build the layer using this image (cf Note 2), by running:
docker run -v "$PWD":/var/task "build-python3.6-pycurl" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.6/site-packages/; exit"
Zip the layer by running:
zip mylayer.zip python > /dev/null
Send the file mylayer.zip to aws as a layer and make your lambda points to it (using the console, or following the tuto https://aws.amazon.com/fr/premiumsupport/knowledge-center/lambda-layer-simulated-docker/).
Test your lambda and celebrate!
Note 1. If you want to use python 3.8, just change 3.6 or 36 by 3.8 and 38.
Note 2. Do not forget to remove the python folder when regenerating the layer, using admin rights if necessary.
Note 3. Mind the symlink in the last line of the DockerFile. Without it, gcc won't be able to find some header files, such as Python.h.
Note 4. Compile pycurl with openssl backend, for it is the ssl backend used in the lambda executing environment. Or else you'll get a libcurl link-time ssl backend (openssl) is different from compile-time ssl backend error when executing the lambda.

AWS Glue Python Shell Job Connect Timeout Error

Trying to run AWS Glue Python Shell Job but gives me Connect Timeout Error
Error Image : https://i.stack.imgur.com/MHpHg.png
Script : https://i.stack.imgur.com/KQxkj.png
It looks like you didn't added secretsmanager endpoint to your VPC. As the traffic will not leave AWS network there will not be internet access inside your Glue job's VPC. So if you want to connect to secretsmanager then you need to add it to your VPC.
Refer to this on how you can add this to your VPC and this to make sure you have properly configured security groups.
AWS Glue Git Issue
Hi,
We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3
AWS Glue Python Shell with Internet
Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.
Download the following whl files
awscli-1.18.183-py2.py3-none-any.whl
boto3-1.16.23-py2.py3-none-any.whl
Upload the files to s3 bucket in your given python library path
Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma
AWS Glue Python Shell without Internet connectivity
Reference: AWS Wrangler Glue dependency build
We followed the steps mentioned above for awscli and boto3 whl files
Below is the latest requirements.txt compiled for the newest versions
colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0
Download the dependencies to libs folder
pip download -r requirements.txt -d libs
Move the original main whl files also to the lib directory
awscli-1.18.183-py2.py3-none-any.whl
boto3-1.16.23-py2.py3-none-any.whl
Package as a zip file
cd libs zip ../boto3-depends.zip *
Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path
Note: It is Referenced files path and not Python library path
Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell.
import os.path
import subprocess
import sys
# borrowed from https://stackoverflow.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))
zip_file = get_referenced_filepath("awswrangler-depends.zip")
subprocess.run()
# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(, shell=True)
#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys =
for k in keys:
if 'boto' in k:
del sys.modules[k]
import boto3
print('boto3 version')
print(boto3.__version__)
Check if the code is working with latest AWS CLI API
Thanks
Sarath

AWS Lambda Console - Upgrade boto3 version

I am creating a DeepLens project to recognise people, when one of select group of people are scanned by the camera.
The project uses a lambda, which processes the images and triggers the 'rekognition' aws api.
When I trigger the API from my local machine - I get a good response
When I trigger the API from AWS console - I get failed response
Problem
After much digging, I found that the 'boto3' (AWS python library) is of version:
1.9.62 - on my local machine
1.8.9 - on AWS console
Question
Can I upgrade the 'boto3' library version on the AWS lambda console ?? If so, how ?
If you don't want to package a more recent boto3 version with you function, you can download boto3 with each invocation of the Lambda. Remember that /tmp/ is the directory that Lambda will allow you to download to, so you can use this to temporarily download boto3:
import sys
from pip._internal import main
main(['install', '-I', '-q', 'boto3', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check'])
sys.path.insert(0,'/tmp/')
import boto3
from botocore.exceptions import ClientError
def handler(event, context):
print(boto3.__version__)
You can achieve the same with either Python function with dependencies or with a Virtual Environment.
These are the available options other than that you also try to contact Amazon team if they can help you with up-gradation.
I know, you're asking for a solution through Console, but this is not possible (as of my knowledge).
To solve this you need to provide the boto3 version you require to your lambda (either with the solution from user1998671 or with what Shivang Agarwal is proposing). A third solution is to provide the required boto3 version as a layer for the lambda. The big advantage of the layer is that you can re-use it for all your lambdas.
This can be achieved by following the guide from AWS (the following is mainly copied from the linked guide from AWS):
IMPORTANT: Make sure to adjust boto3-mylayer with a for you suitable name.
Create a lib folder by running the following command:
LIB_DIR=boto3-mylayer/python
mkdir -p $LIB_DIR
Install the library to LIB_DIR by running the following command:
pip3 install boto3 -t $LIB_DIR
Zip all the dependencies to /tmp/boto3-mylayer.zip by running the following command:
cd boto3-mylayer
zip -r /tmp/boto3-mylayer.zip .
Publish the layer by running the following command:
aws lambda publish-layer-version --layer-name boto3-mylayer --zip-file fileb:///tmp/boto3-mylayer.zip
The command returns the new layer's Amazon Resource Name (ARN), similar to the following one:
arn:aws:lambda:region:$ACC_ID:layer:boto3-mylayer:1
To attach this layer to your lambda execute the following:
aws lambda update-function-configuration --function-name <name-of-your-lambda> --layers <layer ARN>
To verify the boto version in your lambda you can simply add the following two print commands in your lambda:
print(boto3.__version__)
print(botocore.__version__)

boto3 throws error in when packaged under rpm

I am using boto3 in my project and when i package it as rpm it is raising error while initializing ec2 client.
<class 'botocore.exceptions.DataNotFoundError'>:Unable to load data for: _endpoints. Traceback -Traceback (most recent call last):
File "roboClientLib/boto/awsDRLib.py", line 186, in _get_ec2_client
File "boto3/__init__.py", line 79, in client
File "boto3/session.py", line 200, in client
File "botocore/session.py", line 789, in create_client
File "botocore/session.py", line 682, in get_component
File "botocore/session.py", line 809, in get_component
File "botocore/session.py", line 179, in <lambda>
File "botocore/session.py", line 475, in get_data
File "botocore/loaders.py", line 119, in _wrapper
File "botocore/loaders.py", line 377, in load_data
DataNotFoundError: Unable to load data for: _endpoints
Can anyone help me here. Probably boto3 requires some run time resolutions which it not able to get this in rpm.
I tried with using LD_LIBRARY_PATH in /etc/environment which is not working.
export LD_LIBRARY_PATH="/usr/lib/python2.6/site-packages/boto3:/usr/lib/python2.6/site-packages/boto3-1.2.3.dist-info:/usr/lib/python2.6/site-packages/botocore:
I faced the same issue:
botocore.exceptions.DataNotFoundError: Unable to load data for: ec2/2016-04-01/service-2
For which I figured out the directory was missing. Updating botocore by running the following solved my issue:
pip install --upgrade botocore
Botocore depends on a set of service definition files that it uses to generate clients on the fly. Boto3 further depends on another set of files that it uses to generate resource clients. You will need to include these in any installs of boto3 or botocore. The files will need to be located in the 'data' folder of the root of the respective library.
I faced similar issue which was due to old version of botocore. Once I updated it, it started working.
Please consider using below command.
pip install --upgrade botocore
Also please ensure, you have setup boto configuration profile.
Boto searches credentials in below order.
Passing credentials as parameters in the boto.client() method
Passing credentials as parameters when creating a Session object
Environment variables
Shared credential file (~/.aws/credentials)
AWS config file (~/.aws/config)
Assume Role provider
Boto2 config file (/etc/boto.cfg and ~/.boto)
Instance metadata service on an Amazon EC2 instance that has an IAM
role configured.