How to install Google or-tools on AWS Lambda? - amazon-web-services

I've been successfully using Google's or-tools on AWS EC2 instances but have recently been looking into including them in AWS Lambda functions but can't get it to run.
Function debug.py
Below is just a basic function importing the pywrapcp from ortools which should succeed if everything is set up correctly.
from ortools.constraint_solver import pywrapcp
def handler(event, context):
print(pywrapcp)
if __name__ == '__main__':
handler(None, None)
Failing Module Import
I created a package.sh script that copies all dependencies to the project following Amazon's instructions before creating a zip archive. Running the deployed code results in this:
Unable to import module 'debug': No module named ortools.constraint_solver
Contents of package.sh
#!/bin/bash
DEST_DIR=$(dirname $(realpath -s $0));
echo "Copy all native libraries...";
mkdir -p ./lib && find $directory -type f -name "*.so" | xargs cp -t ./lib;
echo "Create package...";
zip -r dist.zip debug.py lib;
rm -r ./lib;
echo "Add dependencies from $VIRTUAL_ENV to $DEST_DIR/dist.zip";
cd $VIRTUAL_ENV/lib/python2.7/site-packages;
zip -ur $DEST_DIR/dist.zip ./** -x;
When I copy the ortools folder from ortools-4.4.3842-py2.7-linux-x86_64.egg directly into the project root it finds ortools but then fails to import pywrapcp which may be related to a failure loading the native libraries but I'm not sure since the logs don't show much detail.
Unable to import module 'debug': cannot import name pywrapcp
Any ideas?

Following the discussion on Google or-tools I put together a packaging script that works around the issues installing the dependencies in a way that works for AWS Lambda.
They key part of it is that the contents of the egg packages have to be copied manually to the Lambda project folder and been given the correct permission for them the be accessible during runtime.
#!/bin/sh
easy_install3 py3-ortools
find "/opt/python3/lib/python3.6/site-packages" -path "*.egg/*" -not -name "EGG-INFO" -maxdepth 2 -exec cp -r {} ./dist \;
chmod -R 755 ./dist
Instead of creating and configuring an EC2 instance, you can use Docker to create a deployable package locally, see or-tools-lambda for details.

Firstly, the underlying AWS Lambda execution environment is Amazon Linux while or-tools is not tested beyond the below environments as per https://github.com/google/or-tools
Ubuntu 14.04 and 16.04 up (64-bit).
Mac OS X El Capitan with Xcode 7.x (64 bit).
Microsoft Windows with Visual Studio 2013 and 2015 (64-bit)
Test your code by launching an instance with one of the ami's which aws lambda uses in the list here (http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html)
If it works, use pip to install dependencies/libraries at the root level of your project directory and then zip. Do not copy the libraries manually to your project directory

Related

Errors when trying to call pycurl in a lambda on AWS

I want to use pycurl in order to have TTFB and TTLB, but am unable to call pycurl in an AWS lambda.
To focus on the issue, let say I call this simple lambda function:
import json
import pycurl
import certifi
def lambda_handler(event, context):
client_curl = pycurl.Curl()
client_curl.setopt(pycurl.CAINFO, certifi.where())
client_curl.setopt(pycurl.URL, "https://www.arolla.fr/blog/author/edouard-gomez-vaez/") #set url
client_curl.setopt(pycurl.FOLLOWLOCATION, 1)
client_curl.setopt(pycurl.WRITEFUNCTION, lambda x: None)
content = client_curl.perform()
dns_time = client_curl.getinfo(pycurl.NAMELOOKUP_TIME) #DNS time
conn_time = client_curl.getinfo(pycurl.CONNECT_TIME) #TCP/IP 3-way handshaking time
starttransfer_time = client_curl.getinfo(pycurl.STARTTRANSFER_TIME) #time-to-first-byte time
total_time = client_curl.getinfo(pycurl.TOTAL_TIME) #last requst time
client_curl.close()
data = json.dumps({'dns_time':dns_time,
'conn_time':conn_time,
'starttransfer_time':starttransfer_time,
'total_time':total_time,
})
return {
'statusCode': 200,
'body': data
}
I have the following error, which is understandable:
Unable to import module 'lambda_function': No module named 'pycurl'
I followed the tuto https://aws.amazon.com/fr/premiumsupport/knowledge-center/lambda-layer-simulated-docker/ in order to create a layer, but then have the following error while generated the layer with docker (I extracted the interesting part):
Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
I even tried to generate the layer just launching on my own machine:
pip install -r requirements.txt -t python/lib/python3.6/site-packages/
zip -r mypythonlibs.zip python > /dev/null
And then uploading the zip as a layer in aws, but I then have another error when lanching the lambda:
Unable to import module 'lambda_function': libssl.so.1.0.0: cannot open shared object file: No such file or directory
It seems that the layer has to be built on a somehow extended target environment.
After a couple of hours scratching my head, I managed to resolve this issue.
TL;DR: build the layer by using a docker image inherited from the aws one, but with the needed libraries, for instance libcurl-devel, openssl-devel, python36-devel. Have a look at the trick Note 3 :).
The detailed way:
Prerequisite: having Docker installed
In a empty directory, copy your requirements.txt containing pycurl (in my case: pycurl~=7.43.0.5)
In this same directory, create the following Dockerfile (cf Note 3):
FROM public.ecr.aws/sam/build-python3.6
RUN yum install libcurl-devel python36-devel -y
RUN yum install openssl-devel -y
ENV PYCURL_SSL_LIBRARY=openssl
RUN ln -s /usr/include /var/lang/include
Build the docker image:
docker build -t build-python3.6-pycurl .
build the layer using this image (cf Note 2), by running:
docker run -v "$PWD":/var/task "build-python3.6-pycurl" /bin/sh -c "pip install -r requirements.txt -t python/lib/python3.6/site-packages/; exit"
Zip the layer by running:
zip mylayer.zip python > /dev/null
Send the file mylayer.zip to aws as a layer and make your lambda points to it (using the console, or following the tuto https://aws.amazon.com/fr/premiumsupport/knowledge-center/lambda-layer-simulated-docker/).
Test your lambda and celebrate!
Note 1. If you want to use python 3.8, just change 3.6 or 36 by 3.8 and 38.
Note 2. Do not forget to remove the python folder when regenerating the layer, using admin rights if necessary.
Note 3. Mind the symlink in the last line of the DockerFile. Without it, gcc won't be able to find some header files, such as Python.h.
Note 4. Compile pycurl with openssl backend, for it is the ssl backend used in the lambda executing environment. Or else you'll get a libcurl link-time ssl backend (openssl) is different from compile-time ssl backend error when executing the lambda.

How to download an entire bucket in GCP?

I have a problem downloading entire folder in GCP. How should I download the whole bucket? I run this code in GCP Shell Environment:
gsutil -m cp -R gs://my-uniquename-bucket ./C:\Users\Myname\Desktop\Bucket
and I get an error message: "CommandException: Destination URL must name a directory, bucket, or bucket subdirectory for the multiple source form of the cp command. CommandException: 7 files/objects could not be transferred."
Could someone please point out the mistake in the code line?
To download an entire bucket You must install google cloud SDK
then run this command
gsutil -m cp -R gs://project-bucket-name path/to/local
where path/to/local is your path of local storage of your machine
The error lies within the destination URL as specified by the error message.
I run this code in GCP Shell Environment
Remember that you are running the command from the Cloud Shell and not in a local terminal or Windows Command Line. Thus, it is throwing that error because it cannot find the path you specified. If you inspect the Cloud Shell's file system/structure, it resembles more that of a Unix environment in which you can specify the destination like such instead: ~/bucketfiles/. Even a simple gsutil -m cp -R gs://bucket-name.appspot.com ./ will work since Cloud Shell can identify the ./ directory which is the current directory.
A workaround to this issue is to perform the command on your Windows Command Line. You would have to install Google Cloud SDK beforehand.
Alternatively, this can also be done in Cloud Shell, albeit with an extra step:
Download the bucket objects by running gsutil -m cp -R gs://bucket-name ~/ which will download it into the home directory in Cloud Shell
Transfer the files downloaded in the ~/ (home) directory from Cloud Shell to the local machine either through the User Interface or by running gcloud alpha cloud-shell scp
Your destination path is invalid:
./C:\Users\Myname\Desktop\Bucket
Change to:
/Users/Myname/Desktop/Bucket
C: is a reserved device name. You cannot specify reserved device names in a relative path. ./C: is not valid.
There is not a one-button solution for downloading a full bucket to your local machine through the Cloud Shell.
The best option for an environment like yours (only using the Cloud Shell interface, without gcloud installed on your local system), is to follow a series of steps:
Downloading the whole bucket on the Cloud Shell environment
Zip the contents of the bucket
Upload the zipped file
Download the file through the browser
Clean up:
Delete the local files (local in the context of the Cloud Shell)
Delete the zipped bucket file
Unzip the bucket locally
This has the advantage of only having to download a single file on your local machine.
This might seem a lot of steps for a non-developer, but it's actually pretty simple:
First, run this on the Cloud Shell:
mkdir /tmp/bucket-contents/
gsutil -m cp -R gs://my-uniquename-bucket /tmp/bucket-contents/
pushd /tmp/bucket-contents/
zip -r /tmp/zipped-bucket.zip .
popd
gsutil cp /tmp/zipped-bucket.zip gs://my-uniquename-bucket/zipped-bucket.zip
Then, download the zipped file through this link: https://storage.cloud.google.com/my-uniquename-bucket/zipped-bucket.zip
Finally, clean up:
rm -rf /tmp/bucket-contents
rm /tmp/zipped-bucket.zip
gsutil rm gs://my-uniquename-bucket/zipped-bucket.zip
After these steps, you'll have a zipped-bucket.zip file in your local system that you can unzip with the tool of your choice.
Note that this might not work if you have too much data in your bucket and the Cloud Shell environment can't store all the data, but you could repeat the same steps on folders instead of buckets to have a manageable size.

AWS folder location installed using anaconda

I installed AWS CLI using Anaconda. I am using Linux Mint 19.1. I am not sure where to find .aws folder because I use Anaconda instead of pip install awscli --upgrade --user.
Here are the paths I see using find . -iname "aws":
./anaconda3/share/terminfo/a/aws
./anaconda3/pkgs/ncurses-6.1-hfc679d8_2/share/terminfo/a/aws
./anaconda3/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws
./anaconda3/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-s3/include/aws
./anaconda3/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-kinesis/include/aws
./anaconda3/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-core/include/aws
./anaconda3/pkgs/awscli-1.16.92-py36_0/bin/aws
./anaconda3/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws
./anaconda3/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-s3/include/aws
./anaconda3/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-kinesis/include/aws
./anaconda3/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-core/include/aws
./anaconda3/lib/python3.6/site-packages/tensorflow/include/external/aws
./anaconda3/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-s3/include/aws
./anaconda3/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-kinesis/include/aws
./anaconda3/lib/python3.6/site-packages/tensorflow/include/external/aws/aws-cpp-sdk-core/include/aws
./anaconda3/bin/aws
I check these 2 paths:
./anaconda3/pkgs/awscli-1.16.92-py36_0/bin/aws
./anaconda3/bin/aws
using find . -iname "config" | grep aws and find . -iname "credentials" | grep aws but both of them don't contain config or credentials file.
So where can I find the .aws folder, which wasn't created? I can confirm that aws is installed as aws --version returns aws-cli/1.16.92 Python/3.6.7 Linux/4.15.0-43-generic botocore/1.12.82
When I was reading about the configuration and credential files here https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html, I tried to find these files but was unable to.
Following #bwest suggestion, I followed the steps in the previous page here https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html using the command aws configure, and did my find again.
This time I'm able to locate the .aws folder, config and credentials files.
So in short, if you install awscli the .aws will not be automatically created, you have to execute aws configure.

Where can I find the folder which I downloaded from gcloud bucket

By using gcloud shell I have downloaded all my bucket but i couldn't find the downloaded files.
I used the command
gsutil -m cp -R gs://bucket/* .
P.S. Please don't make -1 on that post if I asked something wrong let me know in comments and I will learn how to ask a question correctly and save your time. Thanks
You used the command gsutil cp, as documented here:
https://cloud.google.com/storage/docs/gsutil/commands/cp
The parameters for this command are:
gsutil cp [OPTION]... src_url dst_url
So you used Option gsutil -m for to perform a parallel (multi-threaded/multi-processing) copy.
Then you also added -R to traverse all directories in your bucket
As "destination URL" you entered a "." which specified the current working directory.
So your files should be located in your home directory, or in any directory where you switched to using the cd command inside your command window.
It would download to the directory you were in when you ran the command. If you never changed the directory using $cd ... command, then it should be at the root. On a Mac, that would be Macintosh > Users > YourName.

AWS Lambda Error: Unzipped size must be smaller than 262144000 bytes

I am developing one lambda function, which use the ResumeParser library made in the python 2.7. But when I deploy this function including the library on the AWS it's throwing me following error:
Unzipped size must be smaller than 262144000 bytes
Perhaps you did not exclude development packages which made your file to grow that big.
I my case, (for NodeJS) I had missing the following in my serverless.yml:
package:
exclude:
- node_modules/**
- venv/**
See if there are similar for Python or your case.
This is a hard limit which cannot be changed:
AWS Lambda Limit Errors
Functions that exceed any of the limits listed in the previous limits tables will fail with an exceeded limits exception. These limits are fixed and cannot be changed at this time. For example, if you receive the exception CodeStorageExceededException or an error message similar to "Code storage limit exceeded" from AWS Lambda, you need to reduce the size of your code storage.
You need to reduce the size of your package. If you have large binaries place them in s3 and download on bootstrap. Likewise for dependencies, you can pip install or easy_install them from an s3 location which will be faster than pulling from pip repos.
The best solution to this problem is to deploy your Lambda function using a Docker container that you've built and pushed to AWS ECR. Lambda container images have a limit of 10 gb.
Here's an example using Python flavored AWS CDK
from aws_cdk import aws_lambda as _lambda
self.lambda_from_image = _lambda.DockerImageFunction(
scope=self,
id="LambdaImageExample",
function_name="LambdaImageExample",
code=_lambda.DockerImageCode.from_image_asset(
directory="lambda_funcs/LambdaImageExample"
),
)
An example Dockerfile contained in the directory lambda_funcs/LambdaImageExample alongside my lambda_func.py and requirements.txt:
FROM amazon/aws-lambda-python:latest
LABEL maintainer="Wesley Cheek"
RUN yum update -y && \
yum install -y python3 python3-dev python3-pip gcc && \
rm -Rf /var/cache/yum
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY lambda_func.py ./
CMD ["lambda_func.handler"]
Run cdk deploy and the Lambda function will be automagically bundled into an image along with its dependencies specified in requirements.txt, pushed to an AWS ECR repository, and deployed.
This Medium post was my main inspiration
Edit:
(More details about this solution can be found in my Dev.to post here)
A workaround that worked for me:
Install pyminifier:
pip install pyminifier
Go to the library folder that you want to zip. In my case I wanted to zip the site-packages folder in my virtual env. So I created a site-packages-min folder at the same level where site-packages was. Run the following shell script to minify the python files and create identical structure in the site-packages-min folder. Zip and upload these files to S3.
#/bin/bash
for f in $(find site-packages -name '*.py')
do
ori=$f
res=${f/site-packages/site-packages-min}
filename=$(echo $res| awk -F"/" '{print $NF}')
echo "$filename"
path=${res%$filename}
mkdir -p $path
touch $res
pyminifier --destdir=$path $ori >> $res || cp $ori $res
done
HTH
As stated by Greg Wozniak, you may just have imported useless directories like venv and node_modules.
package.exclude is now deprecated and removed in serverless 4, you should now use package.patterns instead:
package:
patterns:
- '!node_modules/**'
- '!venv/**'
In case you're using CloudFormation, in your template yaml file, make sure your 'CodeUri' property includes only your necessary code files and does not contain stuff like the .aws-sam directory (which is big) etc.