Running a python code placed in S3 using Lambda function - amazon-web-services

I have a python code placed in S3.
That python code would be reading an excel file as source file placed in S3 and will do some transformations.
I have created a Lambda function which will get triggered once there will be a PUT event on the S3(whenever source gets placed to the S3 folder).
Requirement is to run that python code using the same Lambda function or to have the python code configured within the same Lambda function.
Thanks in advance.

You can download the python code to /tmp/ temporary storage in lambda. Then you can import the file inside your code using the import statement. Make sure your import statement gets executed after you have downloaded the file to tmp.
You can also have a look here to see other methods to run new script from within script.
EDIT:
Here's how you can download to /tmp/
import boto3
s3 = boto3.client('s3')
s3.download_file('bucket_name','filename.py','/tmp/filename.py')
Make sure your lambda role has permission to access s3

Related

Cant Import packages from layers in AWS Lambda

I know this question exists several places, but even by following different guides/answers I still cant get it to work. I have no idea what I do wrong. I have a lambda Python function on AWS where i need to do a "import requests". This is my approach so far.
Create .zip directory of packages. Locally I do:
pip3 install requests -t ./
zip -r okta_layer.zip .
Upload .zip directory to a lambda layer:
I go to the AWS console and go to lambda layers. I create a new layer based on this .zip file.
I go to my lambda python function and add the layer to the function directly form the console. I can now see the layer under "Layers" for the lambda function. Then i run the script it still complains about:
Unable to import module 'lambda_function': No module named 'requests'
I solved the problem. Apparently I needed to have a .zip folder, with a "python" folder inside, and inside that "python" folder should be all the packages.
I only had all the packages in the zip folder directly without a "python" folder ...

AWS Lambda layer has no execute permission

I create a lambda lambda for Python runtime (3.6 and 3.7 compatible) that contains a bin executable (texlive)
But when I try to execute it through subprocess.run it says that it has no execution permissions!
How can I make it so this layer has execute permissions? I zipped the layer files on Windows 10 so I'm not sure how to add Linux execute permission.
Also, as far as I know when you unzip a file it "resets" the permissions, so if AWS is not setting the execute permissions when unzipping my layers, what can I do?
By the way, I'm uploading my layer via the aws console
I installed the WSL on Windows 10 and zipped up my layer using the zip executable from within Ubuntu:
zip -r importtime_wrapper_layer.zip .
It created a zip file that retained the 755 file permissions on my script.
I was able to view that the correct attributes were present using 7zip and the Lambda runtime was able to execute it.

AWS Lambda Console - Upgrade boto3 version

I am creating a DeepLens project to recognise people, when one of select group of people are scanned by the camera.
The project uses a lambda, which processes the images and triggers the 'rekognition' aws api.
When I trigger the API from my local machine - I get a good response
When I trigger the API from AWS console - I get failed response
Problem
After much digging, I found that the 'boto3' (AWS python library) is of version:
1.9.62 - on my local machine
1.8.9 - on AWS console
Question
Can I upgrade the 'boto3' library version on the AWS lambda console ?? If so, how ?
If you don't want to package a more recent boto3 version with you function, you can download boto3 with each invocation of the Lambda. Remember that /tmp/ is the directory that Lambda will allow you to download to, so you can use this to temporarily download boto3:
import sys
from pip._internal import main
main(['install', '-I', '-q', 'boto3', '--target', '/tmp/', '--no-cache-dir', '--disable-pip-version-check'])
sys.path.insert(0,'/tmp/')
import boto3
from botocore.exceptions import ClientError
def handler(event, context):
print(boto3.__version__)
You can achieve the same with either Python function with dependencies or with a Virtual Environment.
These are the available options other than that you also try to contact Amazon team if they can help you with up-gradation.
I know, you're asking for a solution through Console, but this is not possible (as of my knowledge).
To solve this you need to provide the boto3 version you require to your lambda (either with the solution from user1998671 or with what Shivang Agarwal is proposing). A third solution is to provide the required boto3 version as a layer for the lambda. The big advantage of the layer is that you can re-use it for all your lambdas.
This can be achieved by following the guide from AWS (the following is mainly copied from the linked guide from AWS):
IMPORTANT: Make sure to adjust boto3-mylayer with a for you suitable name.
Create a lib folder by running the following command:
LIB_DIR=boto3-mylayer/python
mkdir -p $LIB_DIR
Install the library to LIB_DIR by running the following command:
pip3 install boto3 -t $LIB_DIR
Zip all the dependencies to /tmp/boto3-mylayer.zip by running the following command:
cd boto3-mylayer
zip -r /tmp/boto3-mylayer.zip .
Publish the layer by running the following command:
aws lambda publish-layer-version --layer-name boto3-mylayer --zip-file fileb:///tmp/boto3-mylayer.zip
The command returns the new layer's Amazon Resource Name (ARN), similar to the following one:
arn:aws:lambda:region:$ACC_ID:layer:boto3-mylayer:1
To attach this layer to your lambda execute the following:
aws lambda update-function-configuration --function-name <name-of-your-lambda> --layers <layer ARN>
To verify the boto version in your lambda you can simply add the following two print commands in your lambda:
print(boto3.__version__)
print(botocore.__version__)

How can I run a Dataflow pipeline with a setup file using Cloud Composer/Apache Airflow?

I have a working Dataflow pipeline the first runs setup.py to install some local helper modules. I now want to use Cloud Composer/Apache Airflow to schedule the pipeline. I've created my DAG file and placed it in the designated Google Storage DAG folder along with my pipeline project. The folder structure looks like this:
{Composer-Bucket}/
dags/
--DAG.py
Pipeline-Project/
--Pipeline.py
--setup.py
Module1/
--__init__.py
Module2/
--__init__.py
Module3/
--__init__.py
The part of my DAG that specifies the setup.py file looks like this:
resumeparserop = dataflow_operator.DataFlowPythonOperator(
task_id="resumeparsertask",
py_file="gs://{COMPOSER-BUCKET}/dags/Pipeline-Project/Pipeline.py",
dataflow_default_options={
"project": {PROJECT-NAME},
"setup_file": "gs://{COMPOSER-BUCKET}/dags/Pipeline-Project/setup.py"})
However, when I look at the logs in the Airflow Web UI, I get the error:
RuntimeError: The file gs://{COMPOSER-BUCKET}/dags/Pipeline-Project/setup.py cannot be found. It was specified in the --setup_file command line option.
I am not sure why it is unable to find the setup file. How can I run my Dataflow pipeline with the setup file/modules?
If you look at the code for DataflowPythonOperator it looks like the main py_file can be a file inside of a GCS bucket and is localized by the operator prior to executing the pipeline. However, I do not see anything like that for the dataflow_default_options. It appears that the options are simply copied and formatted.
Since the GCS dag folder is mounted on the Airflow instances using Cloud Storage Fuse you should be able to access the file locally using the "dags_folder" env var.
i.e. you could do something like this:
from airflow import configuration
....
LOCAL_SETUP_FILE = os.path.join(
configuration.get('core', 'dags_folder'), 'Pipeline-Project', 'setup.py')
You can then use the LOCAL_SETUP_FILE variable for the setup_file property in the dataflow_default_options.
Do you run Composer and Dataflow with the same service account, or are they separate? In the latter case, have you checked whether Dataflow's service account has read access to the bucket and object?

AWS Lambda, Python: Unable to import module that is definitely within Zip package (xmlsec)

I am using the Python module xmlsec in my lambda function. The import looks like import dm.xmlsec.binding as xmlsec. The proper directory structure exists. at the root of the archive there is dm/xmlsec/binding/__init__.py and the rest of the module is there. However, when executing the function on lambda, I get the error "No module named dm.xmlsec.binding"
I have built many Python27 lambda functions in the same way as this one with no issues. I install all of the needed python modules to my build directory, with the lambda function at the root. I then zip the package recursively and update the existing function with the resulting archive using the AWS CLI. I've also tried manually uploading the archive in the console as well, with the same result.
I was honestly expecting some trouble with this module, but I did expect lambda to at least see it. What is going on?