ECS Fargate fails randomly without logs, with container ExitCode 139 - amazon-web-services

I am trying to deploy ETL pipeline on AWS,
Pipeline architecture consist of step function to manage the whole pipeline, it consists of 10 parallel (independent) Fargate (pipeline) invoked using map in step function and a Lambda (Error Notifier),
I have used python 3.7 as base image to build my docker image, pipeline is successfully deployed, but it sometimes randomly fails with container exit code 139.
After further research, I found out that it is sigsegv error (same as error code 11 of linux kernel), related invalid memory access
I am not sure how to rectify this error

[UPDATE]
Recently found out that problem was, image was built x86_84 mac machine, and was deployed on x86_84 ubuntu system, because of that we got SIGTERM error (error code:7 for linux, and error code: 139 for docker), just redeployed the images AWS codebuild, and everything worked perfectly fine from then on.

Related

Amplify backend pull failed in web host app

We have set the CI/CD pipeline in amplify and since 22nd Dec the backend build is getting failed and throwing the error "Failed to pull the backend" as per the attached screenshot,
Expected behavior: The build should be successfully completed. Here I'm attaching the last successful build screenshot,
I tried to redeploy the last successful build, but that also failed and gave the same error.
version details,
Node.js: 16.18.1
Amplify CLI Version: 10.5.2
OS: Amazon Linux 2
NOTE: The project is working fine locally, and also the amplify pull command runs successfully. In local I'm using windows.
Thank you.

How to elevate run time java in Amazon-EC2 instance?

I have been trying to run docker image of my code in an Amazon EC2 instance but what I got was an error with a message "Unsupported major.minor version 52.0". Therefore, I installed Amazon Corretto 17 in the instance and ran the image again but to no avail, all I got was the same error message again .
So, what I want to know is: how do I elevate the run time java of the instance?
So that I can run my image as it is, without having to build the image with a older version of java.

GitLab Runner suddenly fails to run jobs using Docker Machine and AWS Autoscaling

I use GitLab Runner for running CI jobs on AWS EC2 spot instances, using its autoscaling feature with Docker Machine.
All of a sudden, today GitLab CI failed to run jobs and shows me the following job output for all jobs that I want to start:
Running with gitlab-runner 14.9.1 (f188edd7)
on AWS EC2 runner ...
Preparing the "docker+machine" executor
10:05
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Job failed (system failure): exit status 1
I see in the AWS console that the EC2 instances do get created, but the instances always get stopped immediately by GitLab Runner again.
The GitLab Runner system logs show me the following errors:
ERROR: Machine creation failed error=exit status 1 name=runner-eauzytys-gitlab-ci-1651050768-f84b471e time=1m2.409578844s
ERROR: Error creating machine: Error running provisioning: error installing docker: driver=amazonec2 name=runner-xxxxxxxx-gitlab-ci-1651050768-f84b471e operation=create
So the error seams somehow to be related to Docker machine. Upgrading GitLab Runner as well as GitLab's Docker Machine fork to the newest versions do not fix the error. I'm using GitLab 14.8 and tried GitLab Runner 14.9 and 14.10.
What can be the reason for this?
Update:
In the meantime, GitLab have released a new version of their Docker Machine fork which upgrades the default AMI to Ubuntu 20.04. That means that upgrading Docker Machine to the latest version released by GitLab will fix the issue without changing your runner configuration. The latest release can be found here.
Original Workaround/fix:
Explicitly specify the AMI in your runner configuration and do not rely on the default one anymore, i.e. add something like "amazonec2-ami=ami-02584c1c9d05efa69" to your MachineOptions:
MachineOptions = [
"amazonec2-access-key=xxx",
"amazonec2-secret-key=xxx",
"amazonec2-region=eu-central-1",
"amazonec2-vpc-id=vpc-xxx",
"amazonec2-subnet-id=subnet-xxx",
"amazonec2-use-private-address=true",
"amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
"amazonec2-security-group=ci-runners",
"amazonec2-instance-type=m5.large",
"amazonec2-ami=ami-02584c1c9d05efa69", # Ubuntu 20.04 for amd64 in eu-central-1
"amazonec2-request-spot-instance=true",
"amazonec2-spot-price=0.045"
]
You can get a list of Ubuntu AMI IDs here. Be sure to select one that fits your AWS region and instance architecture and is supported by Docker.
Explanation:
The default AMI that GitLab Runner / the Docker Machine EC2 driver use is Ubuntu 16.04. The install script for Docker, which is available on https://get.docker.com/ and which Docker Machine relies on, seems to have stopped supporting Ubuntu 16.04 recently. Thus, the installation of Docker fails on the EC2 instance spawned by Docker Machine and the job cannot run.
See also this GitLab issue.
Azure and GCP suffer from similar problems.
Make sure to select an ami for Ubuntu and not Debian and that your aws account is subscribed to it
What I did
subscribe in aws marketplace to a Ubuntu Amazon Image (Ubuntu 20.04 LTS - Focal)
select launch instance, choose the region, and copy the ami shown
I had the same issue since yesterday.
It could be related to GitLab releasing 15.0 with breaking changes (going live on GitLab.com sometime between April 23 – May 22)
https://about.gitlab.com/blog/2022/04/18/gitlab-releases-15-breaking-changes/
but there is no mention of missing AMI field to add to field MachineOptions
Adding field AMI solved the issue on my side.
Just wanted to add as well, go here for the ubuntu that corresponds with your region. Amis are region specific
As Moritz pointed out:
Adding:
MachineOptions = [
"amazonec2-ami=ami-02584c1c9d05efa69",
]
solves the issue.

ClassNotFoundException - when building the image & push it to GCR using jib-maven-plugin in BitBucket pipeline

I am getting the below error in my GCP Cloud Run service:
Error: Could not find or load main class com.sdas.demo.sd.Application
Caused by: java.lang.ClassNotFoundException: com.sdas.demo.sd.Application
What I was doing:
I have a spring boot application where I used jib-maven-plugin. In BitBucket pipeline, I was executing the below command:
mvn clean compile com.google.cloud.tools:jib-maven-plugin:3.1.4:build -Dimage=eu.gcr.io/sdas-demo-dev/temp-service
After that deploying this GCR image to Cloud Run using gcloud command from BitBucket pipeline. This deployment failed with the error that 'Could not load main class'.
But if I run the mvn clean compile com.google.cloud.tools:jib-maven-plugin:3.1.4:build -Dimage=eu.gcr.io/sdas-demo-dev/temp-service from my computer git bash for the same spring boot application code and then deploy it to Cloud Run (via gcloud command or via console or via pipeline); it's deployed successfully.
I used 'mainClass' tag under jib-maven-plugin in pom.xml. But still it is unable to find or load the main class.
Can anyone help how to identify the problem? Is this a classpath issue or environment issue?
Issue sorted now.
Root cause:
'No resources found to compile' - I found this message in the build log. This message remind me something wrong within the application package.
My system is running on Windows 10 and my application directory starting with 'Java.com.demo.sdas' (J in capital). Since Windows is case in sensitive; it is not causing an issue.
BitBucket pipeline running on Linux server and it is case sensitive. Thus it is unable to find the application directory starting with 'Java.com.demo.sdas'.
Solution: Renamed the directory as 'java' and then everything is working as expected.

AWS Batch Failing to launch Dockerfile - standard_init_linux.go:219: exec user process caused: exec format error

I am attempting to use AWS Batch to launch a linux server, which will in essence perform the fetch and go example included within AWS (to download a SH from S3 and run it).
Does AWS Batch work at all for anyone?
The aws fetch_and_go example always fails, even followed someone elses guide online which mimicked the aws example.
I have tried creating Dockerfile for amazonlinux:latest and ubuntu:20.04 with numerous RUN and CMD.
The scripts always seem to fail with the error:
standard_init_linux.go:219: exec user process caused: exec format error
I thought at first this was relevant to my deployment access rights maybe within the amazonlinux so have played with chmod 777, chmod -x etc on the she file.
The final nail in the coffin, my current script is litterely:
FROM ubuntu:20.04
Launch this using AWS Batch, no command or parameters passed through and it still fails with the same error code. This is almost hinting to me that there is either a setup issue with my AWS Batch (which im using default wizard settings, except changing to an a1.medium server) or that AWS Batch has some major issues.
Has anyone had any success with AWS Batch launching their own Dockerfiles ? Could they share their examples and/or setup parameters?
Thank you in advance.
A1 instances are ARM based first-generation Graviton CPU. It is highly likely the image you are trying to run something that is expecting x86 CPU (Intel or AMD). Any instance class with a "g" in it ("c6g" or "m5g") are Graviton2 which is also ARM based and will not work for the default examples.
You can test whether a specific container will run by launching an A1 instance yourself and running the container (after installing docker). My guess is that you will get the same error. Running on Intel or AMD instances should work.
To leverage Batch with ARM your containerized application will need to work on ARM. If you point me to the exact example, I can give more details on how to adjust to run on A1 or Graviton2 instances.
I had the same issue, and it was because I build the image locally on my M1 Mac.
Try adding --platform linux/amd64 to your docker build command before pushing if this is your case.
In addition to the other comment. You can create multi-arch images yourself which will provide the correct architecture.
https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/