Errors when trying to call pycurl in a lambda on AWS - amazon-web-services

I want to use pycurl in order to have TTFB and TTLB, but am unable to call pycurl in an AWS lambda.
To focus on the issue, let say I call this simple lambda function:
import json
import pycurl
import certifi
def lambda_handler(event, context):
client_curl = pycurl.Curl()
client_curl.setopt(pycurl.CAINFO, certifi.where())
client_curl.setopt(pycurl.URL, "https://www.arolla.fr/blog/author/edouard-gomez-vaez/") #set url
client_curl.setopt(pycurl.FOLLOWLOCATION, 1)
client_curl.setopt(pycurl.WRITEFUNCTION, lambda x: None)
content = client_curl.perform()
dns_time = client_curl.getinfo(pycurl.NAMELOOKUP_TIME) #DNS time
conn_time = client_curl.getinfo(pycurl.CONNECT_TIME) #TCP/IP 3-way handshaking time
starttransfer_time = client_curl.getinfo(pycurl.STARTTRANSFER_TIME) #time-to-first-byte time
total_time = client_curl.getinfo(pycurl.TOTAL_TIME) #last requst time
client_curl.close()
data = json.dumps({'dns_time':dns_time,
'conn_time':conn_time,
'starttransfer_time':starttransfer_time,
'total_time':total_time,
})
return {
'statusCode': 200,
'body': data
}
I have the following error, which is understandable:
Unable to import module 'lambda_function': No module named 'pycurl'
I followed the tuto https://aws.amazon.com/fr/premiumsupport/knowledge-center/lambda-layer-simulated-docker/ in order to create a layer, but then have the following error while generated the layer with docker (I extracted the interesting part):
Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
I even tried to generate the layer just launching on my own machine:
pip install -r requirements.txt -t python/lib/python3.6/site-packages/
zip -r mypythonlibs.zip python > /dev/null
And then uploading the zip as a layer in aws, but I then have another error when lanching the lambda:
Unable to import module 'lambda_function': libssl.so.1.0.0: cannot open shared object file: No such file or directory
It seems that the layer has to be built on a somehow extended target environment.

After a couple of hours scratching my head, I managed to resolve this issue.
TL;DR: build the layer by using a docker image inherited from the aws one, but with the needed libraries, for instance libcurl-devel, openssl-devel, python36-devel. Have a look at the trick Note 3 :).
The detailed way:
Prerequisite: having Docker installed
In a empty directory, copy your requirements.txt containing pycurl (in my case: pycurl~=7.43.0.5)
In this same directory, create the following Dockerfile (cf Note 3):
FROM public.ecr.aws/sam/build-python3.6
RUN yum install libcurl-devel python36-devel -y
RUN yum install openssl-devel -y
ENV PYCURL_SSL_LIBRARY=openssl
RUN ln -s /usr/include /var/lang/include
Build the docker image:
docker build -t build-python3.6-pycurl .
build the layer using this image (cf Note 2), by running:
docker run -v "$PWD":/var/task "build-python3.6-pycurl" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.6/site-packages/; exit"
Zip the layer by running:
zip mylayer.zip python > /dev/null
Send the file mylayer.zip to aws as a layer and make your lambda points to it (using the console, or following the tuto https://aws.amazon.com/fr/premiumsupport/knowledge-center/lambda-layer-simulated-docker/).
Test your lambda and celebrate!
Note 1. If you want to use python 3.8, just change 3.6 or 36 by 3.8 and 38.
Note 2. Do not forget to remove the python folder when regenerating the layer, using admin rights if necessary.
Note 3. Mind the symlink in the last line of the DockerFile. Without it, gcc won't be able to find some header files, such as Python.h.
Note 4. Compile pycurl with openssl backend, for it is the ssl backend used in the lambda executing environment. Or else you'll get a libcurl link-time ssl backend (openssl) is different from compile-time ssl backend error when executing the lambda.

Related

What is the correct configuration AWS SageMaker-Python-SDK to achieve local debugging/training with Apple M1 Pro

I want to run an RL training job on AWS SageMaker(script given below). But since the project is complex I was hoping to do a test run using SageMaker Local Mode (In my M1 MacBook Pro) before submitting to paid instances. However, I am struggling to make this local run successful even with a simple training task.
Now I did use Tensorflow-metal and Tensorflow-macos when running local training jobs(Without SageMaker). And I did not see anywhere I can specify this in the framework_version and nor that I am sure "local_gpu" which is the correct argument for a normal linux machine with GPU is exactly matching for Apple Silicon (M1 Pro).
I searched all over but I cannot find a case where this is addressed. (Very odd, am I doing something wrong? If so, please correct me.) If not and there's anyone who knows of a configuration, a docker image or an example properly done with M1 Pro please share.
I tried to run the following code. Which hangs after Logging in. (If you are trying to run the code, try with any simple training script as entry_point, and make sure to login with a similar code matching to your region using awscli with following command.
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
##main.py
import boto3
import sagemaker
import os
import keras
import numpy as np
from keras.datasets import fashion_mnist
from sagemaker.tensorflow import TensorFlow
sess = sagemaker.Session()
#role = <'arn:aws:iam::0000000000000:role/CFN-SM-IM-Lambda-Catalog-sk-SageMakerExecutionRole-BlaBlaBla'> #KINDLY ADD YOUR ROLE HERE
(x_train, y_train), (x_val, y_val) = fashion_mnist.load_data()
os.makedirs("./data", exist_ok = True)
np.savez('./data/training', image=x_train, label=y_train)
np.savez('./data/validation', image=x_val, label=y_val)
# Train on local data. S3 URIs would work too.
training_input_path = 'file://data/training.npz'
validation_input_path = 'file://data/validation.npz'
# Store model locally. A S3 URI would work too.
output_path = 'file:///tmp/model/'
tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
role=role,
instance_count=1,
instance_type='local_gpu', # Train on the local CPU ('local_gpu' if it has a GPU)
framework_version='2.1.0',
py_version='py3',
hyperparameters={'epochs': 1},
output_path=output_path
)
tf_estimator.fit({'training': training_input_path, 'validation': validation_input_path})
The prebuilt SageMaker Docker Images for Deep Learning doesn't have Arm based support yet.
You can see Available Deep Learning Containers Images here.
The solution is to build your own Docker image and use it with SageMaker.
This is an example Dockerfile that uses miniconda to install TensorFlow dependencies:
FROM arm64v8/ubuntu
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
nginx \
ca-certificates \
gcc \
linux-headers-generic \
libc-dev
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.9.2-Linux-aarch64.sh
RUN chmod a+x Miniconda3-py38_4.9.2-Linux-aarch64.sh
RUN bash Miniconda3-py38_4.9.2-Linux-aarch64.sh -b
ENV PATH /root/miniconda3/bin/:$PATH
COPY ml-dependencies.yml ./
RUN conda env create -f ml-dependencies.yml
ENV PATH /root/miniconda3/envs/ml-dependencies/bin:$PATH
This is the the ml-dependencies.yml:
name: ml-dependencies
dependencies:
- python=3.8
- numpy
- pandas
- scikit-learn
- tensorflow==2.8.2
- pip
- pip:
- sagemaker-training
And this is how you'll run the trainin gusing SageMaker Script mode:
image = 'sagemaker-tensorflow2-graviton-training-toolkit-local'
california_housing_estimator = Estimator(
image_uri=image,
entry_point='california_housing_tf2.py',
source_dir='code',
role=DUMMY_IAM_ROLE,
instance_count=1,
instance_type='local',
hyperparameters={'epochs': 10,
'batch_size': 64,
'learning_rate': 0.1})
inputs = {'train': 'file://./data/train', 'test': 'file://./data/test'}
california_housing_estimator.fit(inputs, logs=True)
You can find the full working sample code on the Amazon SageMaker Local Mode Examples GitHub repository here.

Deploy Pytidylib module in aws lambda by using Lamda Layers

I am trying to deploy pytidylib python module into AWS lambda function by using layers .
I have created the path as described in aws docs and created new layer.
Now the code of pytidylib needs some libraries from /usr/lib but i have installed libraries in /python/lib/python3.7/site-packages/ , so to resolve this i added the path in environ PATH of aws linux server platform , but still the issue is not resolved.
Below is my code :-
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
s3 = boto3.client("s3")
print(sys.platform)
ld_library_path = os.environ["LD_LIBRARY_PATH"]
print("old ld_library_path is ",ld_library_path)
ld_library_path = ld_library_path +
":/opt/python/lib/python3.7/site-packages/"
os.environ["LD_LIBRARY_PATH"] = ld_library_path
print("ld_library_path after set is
",os.environ["LD_LIBRARY_PATH"])
ld_library_path after set is /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib:/opt/python/lib/python3.7/site-packages/
I want to understand is there any way i can make this work through some changes in code and make the pytidylib module run through layers .
Below is the error:-
[ERROR] OSError: Could not load libtidy using any of these names:
libtidy,libtidy.so,libtidy-0.99.so.0,cygtidy-0-99-0,tidylib,libtidy.dylib,tidy
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 68, in lambda_handler
document, errors = tidy_document(doc)
File "/opt/python/lib/python3.7/site-packages/tidylib/tidy.py", line 222, in tidy_document
return get_module_tidy().tidy_document(text, options)
File "/opt/python/lib/python3.7/site-packages/tidylib/tidy.py", line 234, in get_module_tidy
_tidy = Tidy()
File "/opt/python/lib/python3.7/site-packages/tidylib/tidy.py", line 99, in __init__
+ ",".join(lib_names))
I tried to replicate your issue and for me Pytidylib layer works as expected.
This is the way I used to construct the layer, in case you want to give it a try. It involves a docker tool described in the recent AWS blog:
How do I create a Lambda layer using a simulated Lambda environment with Docker?
I created Pytidylib layer as follows:
Create empty folder, e.g. mylayer.
Go to the folder and create requirements.txt file with the content of
Pytidylib
Run the following docker command (may adjust python version to your needs):
docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.8/site-packages/; exit"
Create layer as zip:
zip -r pytidylayer.zip python > /dev/null
Create lambda layer based on pytidylayer.zip in the AWS Console. Don't forget to specify Compatible runtimes to python3.8.
Add the layer to the lambda and test using the following lambda function:
import json
import tidylib
def lambda_handler(event, context):
print(dir(tidylib))
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
The function executed correctly:
['PersistentTidy', 'Tidy', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'release_tidy_doc', 'sink', 'tidy', 'tidy_document', 'tidy_fragment']
I solved this by adding path of the tidy library (libtidy.so.5.2.0) into the environment variable of LD_LIBRARY_PATH of linux server
For me library was pre installed in ubuntu 18.04 server in /usr/lib .Copy the library from this path ,place it inside the tidylib folder create a zip, and follow the steps of lambda-layers creation .

how to use pyhive in lambda function?

I've wrote a function that is using pyhive to read from Hive. Running it locally it works fine. However when trying to use lambda function I got the error:
"Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'"
I've tried to use the guidelines in this link:
https://github.com/cloudera/impyla/issues/201
However, I wasn't able to use latest command:
yum install cyrus-sasl-lib cyrus-sasl-gssapi cyrus-sasl-md5
since the system I was using to build is ubuntu that doesn't support the yum function.
Tried to install those packages (using apt-get):
sasl2-bin libsasl2-2 libsasl2-dev libsasl2-modules libsasl2-modules-gssapi-mit
like described in:
python cannot connect hiveserver2
But still no luck. Any ideas?
Thanks,
Nir.
You can follow this github issue. I am able to connect Hive server2 with LDAP authentication using the pyhive library in AWS Lambda with Python 2.7. What I have done to make it work is:
Took one EC2 instance or launch container with AMI used in Lambda.
Run the following commands to install the required dependencies
yum upgrade
yum install gcc
yum install gcc-g++
sudo yum install cyrus-sasl cyrus-sasl-devel cyrus-sasl-ldap #include cyrus-sals dependency for authentication mechanism you are using to connect to hive
pip install six==1.12.0
Bundle up the /usr/lib64/sasl2/ to Lambda and set os.environ['SASL_PATH'] = os.path.join(os.getcwd(), /path/to/sasl2. Verify if .so files are presented on os.environ['SASL_PATH'] path.
My Lambda code looks like:
from pyhive import hive
import logging
import os
os.environ['SASL_PATH'] = os.path.join(os.getcwd(), 'lib/sasl2')
log = logging.getLogger()
log.setLevel(logging.INFO)
log.info('Path: %s',os.environ['SASL_PATH'])
def lambda_handler(event, context):
cursor = hive.connect(host='hiveServer2Ip', port=10000, username='userName', auth='LDAP',password='password').cursor()
SHOW_TABLE_QUERY = "show tables"
cursor.execute(SHOW_TABLE_QUERY)
tables = cursor.fetchall()
log.info('tables: %s', tables)
log.info('done')

AWS Lambda Error: Unzipped size must be smaller than 262144000 bytes

I am developing one lambda function, which use the ResumeParser library made in the python 2.7. But when I deploy this function including the library on the AWS it's throwing me following error:
Unzipped size must be smaller than 262144000 bytes
Perhaps you did not exclude development packages which made your file to grow that big.
I my case, (for NodeJS) I had missing the following in my serverless.yml:
package:
exclude:
- node_modules/**
- venv/**
See if there are similar for Python or your case.
This is a hard limit which cannot be changed:
AWS Lambda Limit Errors
Functions that exceed any of the limits listed in the previous limits tables will fail with an exceeded limits exception. These limits are fixed and cannot be changed at this time. For example, if you receive the exception CodeStorageExceededException or an error message similar to "Code storage limit exceeded" from AWS Lambda, you need to reduce the size of your code storage.
You need to reduce the size of your package. If you have large binaries place them in s3 and download on bootstrap. Likewise for dependencies, you can pip install or easy_install them from an s3 location which will be faster than pulling from pip repos.
The best solution to this problem is to deploy your Lambda function using a Docker container that you've built and pushed to AWS ECR. Lambda container images have a limit of 10 gb.
Here's an example using Python flavored AWS CDK
from aws_cdk import aws_lambda as _lambda
self.lambda_from_image = _lambda.DockerImageFunction(
scope=self,
id="LambdaImageExample",
function_name="LambdaImageExample",
code=_lambda.DockerImageCode.from_image_asset(
directory="lambda_funcs/LambdaImageExample"
),
)
An example Dockerfile contained in the directory lambda_funcs/LambdaImageExample alongside my lambda_func.py and requirements.txt:
FROM amazon/aws-lambda-python:latest
LABEL maintainer="Wesley Cheek"
RUN yum update -y && \
yum install -y python3 python3-dev python3-pip gcc && \
rm -Rf /var/cache/yum
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY lambda_func.py ./
CMD ["lambda_func.handler"]
Run cdk deploy and the Lambda function will be automagically bundled into an image along with its dependencies specified in requirements.txt, pushed to an AWS ECR repository, and deployed.
This Medium post was my main inspiration
Edit:
(More details about this solution can be found in my Dev.to post here)
A workaround that worked for me:
Install pyminifier:
pip install pyminifier
Go to the library folder that you want to zip. In my case I wanted to zip the site-packages folder in my virtual env. So I created a site-packages-min folder at the same level where site-packages was. Run the following shell script to minify the python files and create identical structure in the site-packages-min folder. Zip and upload these files to S3.
#/bin/bash
for f in $(find site-packages -name '*.py')
do
ori=$f
res=${f/site-packages/site-packages-min}
filename=$(echo $res| awk -F"/" '{print $NF}')
echo "$filename"
path=${res%$filename}
mkdir -p $path
touch $res
pyminifier --destdir=$path $ori >> $res || cp $ori $res
done
HTH
As stated by Greg Wozniak, you may just have imported useless directories like venv and node_modules.
package.exclude is now deprecated and removed in serverless 4, you should now use package.patterns instead:
package:
patterns:
- '!node_modules/**'
- '!venv/**'
In case you're using CloudFormation, in your template yaml file, make sure your 'CodeUri' property includes only your necessary code files and does not contain stuff like the .aws-sam directory (which is big) etc.

elastic beanstalk: incremental push git

When I would like to push incremental changes to the AWS Elastic Beanstalk solution I get the following:
$ git aws.push
Updating the AWS Elastic Beanstalk environment None...
Error: Failed to get the Amazon S3 bucket name
I've already added FULLS3Access to my AWS users policies.
I had a similar issue today and here are the steps I followed to investigate :-
I modified line no 133 at .git/AWSDevTools/aws/dev_tools.py to print the exception like
except Exception, e:
print e
* Please make sure of spaces as Python does not work in case of spaces.
I ran command git aws.push again
and here is the exception printed :-
BotoServerError: 403 Forbidden
{"Error":{"Code":"SignatureDoesNotMatch","Message":"Signature not yet current: 20150512T181122Z is still later than 20150512T181112Z (20150512T180612Z + 5 min.)","Type":"Sender"},"
The issue is because there was a time difference in server and machine I corrected it and it stated working fine.
Basically the Exception will helps to let you know exact root cause, It may be related to Secret key as well.
It may have something to do with the boto-library (related thread). If you are on ubuntu/debian try this:
Remove old version of boto
sudo apt-get remove python-boto
Install newer version
sudo apt-get install python-pip
sudo pip install -U boto
Other systems (e.g. Mac)
Via easy_install
sudo easy_install pip
pip install boto
Or simply build from source
git clone git://github.com/boto/boto.git
cd boto
python setup.py install
Had the same problem a moment ago.
Note:
I just noticed your environment is called none. Did you follow all instructions and executed eb config/eb init?
One more try:
Add export PATH=$PATH:<path to unzipped eb CLI package>/AWSDevTools/Linux/ to your path and execute AWSDevTools-RepositorySetup.sh maybe something is wrong with your repository setup (notice the none weirdness). Other possible solutions:
Doublecheck AWSCredentials (maybe you are using different Key IDs / Wrong CredentialsFile-format)
Old/mismatching versions of eb client & python (check with eb -v and python -v) (current client is this)
Use amazons policy validator to doublecheck if your AWS User is allowed to perform all actions
If all that doesn't help im out of options. Good luck.