Pain points using Amazon CLI to deploy a python script in Lambda Function

Pain points using Amazon CLI to deploy a python script in Lambda Function - amazon-web-services

My plan is simple, the following is my process:
OpenAWSCloudShell --> CreateFolder --> IntallPythonEnv+Packages -->zip --> UploadToAmazonS3 --> CreateLambdaLayer+S3ZipLink --> AddLayer
CreateLambdaFunction -->AddLayer
python3 -m venv [venv-name]
I noticed that the python environment in AWS CLI is only 3.7 (command above).And it always has issues for the package compatibility.
I have a very simple function to test in Amazon Lambda, the script only has two libraries, one is the yfinance and the other is pymysql. I downgraded all packages into the python3.7 environment and still, there is an error about task timed out.
Details
The area below shows the last 4 KB of the execution log.
{
"errorMessage": "2023-02-19T16:35:14.042Z 0574b598-6e47-4e6c-b11a-a96b4db299da Task timed out after 3.01 seconds"
}
Summary
Init duration
1000.93 ms
Duration
3013.29 ms
Billed duration
3000 ms
Resources configured
128 MB
Max memory used
120 MB

Related

Amplify socket connection error while executing push or publish

I'm trying to create a new environment for our app, and I go over the initial steps where you select the env name and those details. I run:
amplify env add mynewenv
But then, either amplify push or amplify publish won't work. I was able to get a good run for amplify push 2 or 3 times, but 90% of the time it failed. And amplify publish failed 100%. These are the two errors I got while running these commands:
Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
An error occurred during the push operation: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
An error occurred when pushing the resources to the cloud
write EPROTO
An error occurred during the push operation: write EPROTO
I always get either one or the other, not both at the same time. Here is some information about my setup:
SO: Ubuntu Mate Release 20.04 LTS (Focal Fossa) 64-bit, Kernel Linux 5.4.0-37-generic x86_64, MATE 1.24.0
Amplify version: 4.21.4
npm version: 6.14.5
aws --version: aws-cli/2.0.24 Python/3.7.3 Linux/5.4.0-37-generic botocore/2.0.0dev28
AWS Region: us-west-2
node version: v14.4.0
I noticed something weird while running these amplify commands. This is an output from speedtest-cli before running amplify push:
Download: 50.30 Mbit/s
Upload: 5.91 Mbit/s
And this is an output while running amplify push:
Download: 8.30 Mbit/s
Upload: 0.72 Mbit/s
Any ideas?

AWS EMR pyspark notebook fails with `Failed to run command /usr/bin/virtualenv (...)`

I have created a basic EMR cluster in AWS, and I'm trying to use the Jupyter Notebooks provided through the AWS Console. Launching the notebooks seems to work fine, and I'm also able to run basic python code in notebooks started with the pyspark kernel. Two variables are set up in the notebook: spark is a SparkSession instance, and sc is a SparkContext instance. Displaying sc yields <SparkContext master=yarn appName=livy-session-0> (the output can of course vary slightly depending on the session).
The problem arises once I perform operations that actually hit the spark machinery. For example:
sc.parallelize(list(range(10))).map(lambda x: x**2).collect()
I am no spark expert, but I believe this code should distribute the integers from 0 to 9 across the cluster, square them, and return the results in a list. Instead, I get a lengthy stack trace, mostly from the JVM, but also some python components. I believe the central part of the stack trace is the following:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 116, ip-XXXXXXXXXXXXX.eu-west-1.compute.internal, executor 17): java.lang.RuntimeException: Failed to run command: /usr/bin/virtualenv -p python3 --system-site-packages virtualenv_application_1586243436143_0002_0
The full stack trace is here.
A bit of digging in the AWS portal led me to log output from the nodes. stdout from one of the nodes includes the following:
The path python3 (from --python=python3) does not exist
I tried running the /usr/bin/virtualenv command on the master node manually (after logging in through), and that worked fine, but the error is of course still present after I did that.
While this error occurs most of the time, I was able to get this working in one session, where I could run several operations against the spark cluster as I was expecting.
Technical information on the cluster setup:
emr-6.0.0
Applications installed are "Ganglia 3.7.2, Spark 2.4.4, Zeppelin 0.9.0, Livy 0.6.0, JupyterHub 1.0.0, Hive 3.1.2". Hadoop is also included.
3 nodes (one of them as master), all r5a.2xlarge.
Any ideas what I'm doing wrong? Note that I am completely new to EMR and Spark.
Edit: Added the stdout log and information about running the virtualenv command manually on the master node through ssh.

I have switched to using emr-5.29.0, which seems to resolve the problem. Perhaps this is an issue with emr-6.0.0? In any case, I have a functional workaround.

The issue for me was that the virtualenv was being made on the executors with a python path that didn't exist. Pointing the executors to the right one did the job for me:
"spark.pyspark.python": "/usr/bin/python3.7"
Here is how I reconfiged the spark app at the beginning of the notebook:
{"conf":{"spark.pyspark.python": "/usr/bin/python3.7",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type": "native",
"spark.pyspark.virtualenv.bin.path":"/usr/bin/virtualenv"}
}

Sam local Invoke lambda with local layer error "unable to import package"

When using 'Sam local invoke'to invoke a lambda locally , which relies on a layer built locally as well, the function cannot find the module which is part of the layer.
I am getting the error "unable to import package" error while invoking my lambda locally using 'sam local invoke FUNCTIONNAME'. Those packages are already present in layer(zipped folder). Is their any way to test these lamdas locally which are having python dependency zipped into layer.
START RequestId: 083247f2-3011-428c-a529-50eba6d668f2 Version: $LATEST
Unable to import module 'getnext': No module named 'apiconfig'
END RequestId: 083247f2-3011-428c-a529-50eba6d668f2
REPORT RequestId: 083247f2-3011-428c-a529-50eba6d668f2 Duration: 12 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 19 MB
'apiconfig' module is already present in layer.zip
Expectation is all the layer modules should get downloaded at some temp location while doing 'sam local invoke' so that it can take all the packages and run the lamda, and hence i tried using --layer-cache-basedir DIRECTORY flag too but still getting the same error, although the docker file is getting generated at DIRECTORY path. Also,Even though i have not provided --skip-pull-image flag it is still showing requested to skip pull images. Added --force-image-build flag too
error

To support Lambda layers, SAM CLI replicates the AWS layer process locally by downloading all associated layers and caching them on your development machine. This happens the first time you run sam local invoke or the first time you execute your Lambda functions using sam local start-lambda or sam local start-api
Two specific flags in SAM CLI are helpful when you’re working with Lambda layers locally –layer-cache-basedir and –force-image-build aws docs

For Googlers:
Requested to skip pull images does not matter in this case
Make sure your local layer is a directory, not a zip file
Make sure the modules exists in /opt folder by using code within lambda function
Make sure you are referring to the right path in template.yaml
I did the above in sequence and found it was a silly mistake.

AWS Glue Python shell jobs: No space left on device

I'm running a Glue job with the following configurations
Type: Python Shell
Python version: Python 2 (1.0) [I'd created these jobs without a version and as per documentation, they should be defaulting to 0.9
The job fails to initialize with the following error:
No space left on device: 'build/bdist.linux-x86_64/egg/pycparser
Has anyone else encountered the same error or have any potential resolutions?

How to tune/troubleshoot/optimize docker block i/o on AWS

I have the following docker containers that I have set up to test my web application:
Jenkins
Apache 1 (serving a laravel app)
Apache 2 (serving a legacy codeigniter app)
MySQL (accessed by both Apache 1 and Apache 2)
Selenium HUB
Selenium Node — ChromeDriver
The jenkins job runs a behat command on Apache 1 which in turn connects to Selenium Hub, which has a ChromeDriver node to actually hit the two apps: Apache 1 and Apache 2.
The whole system is running on an EC2 t2.small instance (1 core, 2GB RAM) with AWS linux.
The problem
The issue I am having is that if I run the pipeline multiple times, the first few times it runs just fine (the behat stage takes about 20s), but on the third and consecutive runs, the behat stage starts slowing down (taking 1m30s) and then failing after 3m or 10m or whenever I lose patience.
If I restart the docker containers, it works again, but only for another 2-4 runs.
Clues
Monitoring docker stats each time I run the jenkins pipeline, I noticed that the Block I/O, and specifically the 'I' was growing exponentially after the first few runs.
For example, after run 1
After run 2
After run 3
After run 4
The Block I/O for the chromedriver container is 21GB and the driver hangs. While I might expect the Block I/O to grow, I wouldn't expect it to grow exponentially as it seems to be doing. It's like something is... exploding.
The same docker configuration (using docker-compose) runs flawlessly every time on my personal MacBook Pro. Block I/O does not 'explode'. I constrain Docker to only use 1 core and 2GB of RAM.
What I've tried
This situation has sent me down the path of learning a lot more about docker, filesystems and memory management, but I'm still not resolving the issue. Some of the things I have tried:
Memory
I set mem_limit options on all containers and tuned them so that during any given run, the memory would not reach 100%. Memory usage now seems fairly stable, and never 'blows up'.
Storage Driver
The default for AWS Linux Docker is devicemapper in loop-lvm mode. After reading this doc
https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-docker-with-devicemapper
I switched to the suggested direct-lvm mode.
docker-compose restart
This does indeed 'reset' the issue, allowing me to get a few more runs in, but it doesn't last. After 2-4 runs, things seize up and the tests start failing.
iotop
Running iotop on the host shows that reads are going through the roof.
My Question...
What is happening that causes the block i/o to grow exponentially? I'm not clear if it's docker, jenkins, selenium or chromedriver that are causing the problem. My first guess is chromedriver, although the other containers are also showing signs of 'exploding'.
What is a good approach to tuning a system like this with multiple moving parts?
Additonal Info
My chromedriver container has the following environment set in docker-compose:
- SE_OPTS=-maxSession 6 -browser browserName=chrome,maxInstances=3
docker info:
$ docker info
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 5
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 4.862 GB
Data Space Total: 20.4 GB
Data Space Available: 15.53 GB
Metadata Space Used: 2.54 MB
Metadata Space Total: 213.9 MB
Metadata Space Available: 211.4 MB
Thin Pool Minimum Free Space: 2.039 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay null host bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.51-40.60.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.03
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.956 GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js