Unable to import module 'lambda_function': No module named '_awscrt' - amazon-web-services

I'm working with this article Asynchronous Amazon Transcribe Streaming SDK for Python.
I'm trying to create a lambda layer for the required libraries.
I used the following command:
pip3 install amazon-transcribe aiofile -t .
But I get the following error when I use the layer in my lambda function:
Unable to import module 'lambda_function': No module named '_awscrt'
The same works fine with virtual environment locally. I'm not sure what's the exact issue.
I even tried installing awscrt separately but it didn't work.
Any kind of help will be greatly appreciated. Thanks!

Lambda layers .zip files need to follow a specific directory file structure. Look at this section of the documentation to see how it should be structured for Python. This might be your problem.

I built the layer on Amazon Linux and it worked fine!
The troubleshooting guide in the repo helped:
The caio linux implementation works normal for modern linux kernel versions and file systems. So you may have problems specific for your environment. It's not a bug and might be resolved some ways:
1. Upgrade the kernel
2. Use compatible file system
3. Use threads based or pure python implementation.

Related

Versions of Python & Spark to work with VS Code Notebooks

I'm developing scripts for AWS Glue, and trying to mimic development environment as close as possible to their specs here. Since it's a bit costly to run a Notebook server/development endpoint, I set everything up on local machine instead, develop scripts on VS Code Notebook, due to its usefulness.
There're some troubles with Notebook setup due to incompatible versions between installed Python & Spark.
For Python, I have gone through some harsh time to clean up, and its version is 3.8.3 now
For Spark, I use the manual method with version of 2.4.3, since I plan to use Scala alongside at later time. I install the findspark package to load that version as expected.
And it doesn't work! The error was TypeError: an integer is required (got type bytes)
I've searched around, and people said to downgrade to Python 3.7 using pyenv, and I got 3.7.7 installed but still had the same error
As a last resort, I tried pip install pyspark. it's Spark 3.0.0, and works fine, but not as expected.
Hope there's someone have experiences of this matter
A better approach would be to install the glue dependencies on docker then ssh into that docker container using VS code to mimic exact glue local dev environment.
I've written a blog about the same if you like to refer
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

How can I import regex on AWS Lambda

I am getting the following error:
Unable to import module '': No module
named 'regex._regex'
The AWS Lambda deployment package runs just fine without import htmldate statement (the module I want to use) which in turn requires regex.
Also the code runs fine locally.
So this seems to be a problem running regex on AWS Lambda.
A new version of htmldate makes some of the dependencies optional, regex is such a case. That should solve the problem. (FYI: I'm the main developer of the package.)
If it runs locally and not in the lambda it may be an issue with the package installation. You may want to install your requirements.txt via a docker replicating the lambdas environment. If it works locally this can be used to ensure you are replicating the environment your lambda is running in during installation.
This docker image can be used to help:
https://hub.docker.com/r/lambci/lambda/
There are some examples specified here: https://github.com/lambci/docker-lambda#build-examples

install be_helper on datalab

I knew that BigQuery module is already installed on datalab. I just wanna to use bq_helper module because I learned it on Kaggle.
I did !pip install -e git+https://github.com/SohierDane/BigQuery_Helper#egg=bq_helper and it worked.
but I can't import the bq_helper. The pic is shown below.
Please help. Thanks!
I used python2 on Datalab.
I am not familiar with the BigQuery Helper library you shared, but in general, in Datalab, it may happen that you need to restart the kernel in order for the libraries to be properly loaded.
I reproduced the scenario you proposed: installing the library with the command !pip install -e git+https://github.com/SohierDane/BigQuery_Helper#egg=bq_helper and then trying to import it in the notebook using:
from bq_helper import BigQueryHelper
bq_assistant = BigQueryHelper("bigquery-public-data", "github_repos")
bq_assistant.project_name
At first, it did not work and I obtained the same error as you; then I clicked on the Reset Session button and the library was loaded properly.
Some other details that may be relevant if this does not work for you are:
I am also running on Python2 (although the GitHub page of the library suggests that it was only tested in Python3.6+).
The Custom metadata parameters in the Datalab GCE instance are: created-with-datalab-version: 20180503 and created-with-sdk-version: 208.0.2.

Amazon Lambda unable to import [python windows .pyd pip]

I am trying to write to my PostgreSQL database with AWS Lambda using the python2.7 runtime. I care very little about how I do this, so if anyone has a different way that I can understand that works, I'd love to hear it.
The method I'm currently trying is to use psycopg2, as this is the only way I know. In order to do this, I need to upload the psycopg2 module to my environment on AWS Lambda. As per instructions, I've created a directory with my source and psycopg2 using pip install psycopg2 -t ..\my-project, zipped my-project, and uploaded it.
My error message is this from within the AWS Lambda console: Unable to import module 'lambda_function': No module named _psycopg
The code runs on my windows machine. I think the issue is that when I import psycopg2 from my local windows machine, the _psycopg module is being imported from _psycopg.pyd, and .pyd files are windows specific. I may be wrong about this.
I'm really just looking for any way to achieve the desired result described in my first paragraph, but here's a more specific question: How do I tell windows to pip install and compile psycopg2 without using .pyd files? Is this possible? Do I have something completely wrong?
I know the formatting of this question is a little unorthodox, I think I've given all the necessary information, let me know if there's anything else I can provide.
I solved the problem by opening an ubuntu instance on VirtualBox, pip installing the package there, pulling the relevant folders out, and placing them in my-project before zipping and uploading to AWS Lambda.
See these instructions.

AWS Lambda: How to use tools that must be installed first in linux?

I understand that AWS Lambda runs on the application layer of an isolated environment.
In many situations, functions need to use third-party tools that must be installed first on the linux machine. For example, a media processing function uses exiftool to extract metadata from image, so I install exiftool first.
Now I want to migrate the media processing code into AWS Lambda. My question is, how can I use those tools that I originally must install on linux? My code is written in Java, and exiftool is necessary.
To expand on Daniel's answer, if you wanted to bundle exiftool, you would follow steps 1 and 2 for Unix/Linux platforms from the official install instructions. You would then include exiftool and lib in your function's zip file. To run exiftool you would do something like:
const exec = require('child_process').exec;
exports.handler = (event, context, callback) => {
// './exiftool' gave me permission denied errors
exec('perl exiftool -ver', (error, stdout, stderr) => {
if (error) {
callback(`error: ${error}`);
return;
}
callback(null, `stderr: ${stderr} \n stdout: ${stdout}`);
});
}
Everything your Lambda function executes must be included in the deployment package you upload.
That means if you want to run Java code, you can reference other Java libraries. (Likewise, if you want to run Node.js code, you can reference other Node libraries.)
Regardless of the tools you use, the resulting .zip file must have the following structure:
All compiled class files and resource files at the root level.
All required jars to run the code in the /lib directory.
(source)
Or you can upload a .jar file.
exiftool, on the other hand, is a Perl command-line program. I suspect that on your local machine, you shell out from your Java code and run it.
You cannot do that in AWS Lambda. You need to find a Java package that extracts EXIF information (I am sure there are plenty to choose from) and include that in your deployment package. You cannot install software packages on Lambda.
https://aws.amazon.com/lambda/faqs/
Q: What languages does AWS Lambda support?
AWS Lambda supports code written in Node.js (JavaScript), Python, and Java (Java 8 compatible). Your code can include existing libraries, even native ones. Please read our documentation on using Node.js, Python and Java.
So basically you can call out to native processes if they are pre-installed but only from JavaScript and Java as the parent process.
To get a rough idea of what is installed have a look at what packages are installed:
https://gist.github.com/royingantaginting/4499668
This list won't be a 100% accurate, to do that you would need to look directly at the AMI image (ami-e7527ed7)
exiftool doesn't appear to be installed by default. I doubt the account running the lambda function would have enough rights to install anything globally but you could always bundle exiftool with your Node or Java function.
You may also want to have a look at lambdash (https://github.com/alestic/lambdash) which allows you to run command from your local command line on a remote lamdba instance
This can now be done using AWS Lambda Layers.
An example of how to prepare a layer for exiftool specifically can be found here:
https://gist.github.com/hughevans/6b8c57839b8194ba910428de4375794a