Spring Cloud Data Flow : Cannot run program "docker" - amazon-web-services

I want to deploy Spring Boot applications using Kinesis streams on Kubernetes cluster on AWS.
I used kops in an AWS EC2 (Amazon Linux) instance to create my cluster and deploy it using terraform.
I installed Spring Cloud Data Flow for Kubernetes using Helm chart. All my pods are up and running and I can access to the Spring Cloud Data Flow interface in order to register my dockerized apps. I am using ECR repositories to upload my Docker images.
When I want to deploy the stream (composed of a time-source and a log-sink), a big nice red error message pops up. I checked the log of the Skipper pod and I have the following error message starting with :
org.springframework.cloud.skipper.SkipperException: Could not install AppDeployRequest
and finishing with :
Caused by: java.io.IOException: Cannot run program "docker" (in directory "/tmp/spring-cloud-deployer-5769885450333766520/time-log-kinesis-stream-1539963209716/time-log-kinesis-stream.log-sink-kinesis-app-v1"): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_111-internal]
at org.springframework.cloud.deployer.spi.local.LocalAppDeployer$AppInstance.start(LocalAppDeployer.java:386) ~[spring-cloud-deployer-local-1.3.7.RELEASE.jar!/:1.3.7.RELEASE]
at org.springframework.cloud.deployer.spi.local.LocalAppDeployer$AppInstance.start(LocalAppDeployer.java:414) ~[spring-cloud-deployer-local-1.3.7.RELEASE.jar!/:1.3.7.RELEASE]
at org.springframework.cloud.deployer.spi.local.LocalAppDeployer$AppInstance.access$200(LocalAppDeployer.java:296) ~[spring-cloud-deployer-local-1.3.7.RELEASE.jar!/:1.3.7.RELEASE]
at org.springframework.cloud.deployer.spi.local.LocalAppDeployer.deploy(LocalAppDeployer.java:199) ~[spring-cloud-deployer-local-1.3.7.RELEASE.jar!/:1.3.7.RELEASE]
... 54 common frames omitted
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_111-internal]
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) ~[na:1.8.0_111-internal]
at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_111-internal]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_111-internal]
... 58 common frames omitted
I already had this error when I tried to deploy on a local k8s cluster on Windows 10 and I thought it was linked to Win10 platform.
I am using spring-cloud-dataflow-server-kubernetes at version 1.6.2.RELEASE.
I really do not have any clues why this error is appearing. Thanks !

It looks like the docker command is not found by the SCDF local deployer's ProcessBuilder when it tries to run the docker exec from this path:
/tmp/spring-cloud-deployer-5769885450333766520/time-log-kinesis-stream-1539963209716/time-log-kinesis-stream.log-sink-kinesis-app-v1
The SCDF sets the above path as its working directory before running the docker command and hence docker is expected to run from this location.

I have found where was the issue. My bad, the problem is always between the keyboard and the chair !
I wanted to remove all the metrics process in the skipper-config.yaml file and I inserted a typo in the configuration file. The JSON env variable data.spring.application.json for the Skipper launch was not valid hence the DeployerInitializationService never saw the properties it needed to add Kubernetes into the repository !
Now in the logs and in the dataflow shell I have the default and the minikube accounts. Thanks for your help anyway :)

Related

My GKE pods stoped with error "no command specified: CreateContainerError"

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.
Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

How to invoke AWS SAM locally using remote docker (as opposed to docker desktop)?

I have AWS SAM installed on a Windows machine. I have followed the instructions here https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started-hello-world.html to create a test Hello World application.
I have docker server running on a separate (Linux) VM. How do I invoke AWS SAM locally?
I have tried the following:
sam local start-api --container-host-interface 0.0.0.0 --container-host 192.168.28.168
where 192.168.28.168 is the Linux VM where docker server is running. (I.e. different to the Windows machine I’m developing on).
However, I get “Error: Cannot find module”:
PS C:\Develop\AWS\sam-app> sam local start-api --container-host-interface 0.0.0.0 --container-host 192.168.28.168
Mounting HelloWorldFunction at http://127.0.0.1:3000/hello [GET]
You can now browse to the above endpoints to invoke your functions. You do not need to restart/reload SAM CLI while working on your functions, changes will be reflected instantly/automatically. You only need to restart SAM CLI if you update your AWS SAM template
2021-09-24 07:50:10 * Running on http://127.0.0.1:3000/ (Press CTRL+C to quit)
Invoking app.lambdaHandler (nodejs14.x)
Skip pulling image and use local one: amazon/aws-sam-cli-emulation-image-nodejs14.x:rapid-1.27.2.
Mounting C:\Develop\AWS\sam-app\.aws-sam\build\HelloWorldFunction as /var/task:ro,delegated inside runtime container
START RequestId: bd6b8177-56bb-4464-8ead-8c46809e6c6c Version: $LATEST
2021-09-24T06:50:35.674Z undefined ERROR Uncaught Exception {"errorType":"Runtime.ImportModuleError","errorMessage":"Error: Cannot find module 'app'\nRequire stack:\n- /var/runtime/UserFunction.js\n- /var/runtime/index.js","stack":["Runtime.ImportModuleError: Error: Cannot find module 'app'","Require stack:","- /var/runtime/UserFunction.js","- /var/runtime/index.js"," at _loadUserApp (/var/runtime/UserFunction.js:100:13)"," at Object.module.exports.load (/var/runtime/UserFunction.js:140:17)"," at Object.<anonymous> (/var/runtime/index.js:43:30)"," at Module._compile (internal/modules/cjs/loader.js:1085:14)"," at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)"," at Module.load (internal/modules/cjs/loader.js:950:32)"," at Function.Module._load (internal/modules/cjs/loader.js:790:14)"," at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)"," at internal/main/run_main_module.js:17:47"]}
time="2021-09-24T06:50:35.691" level=panic msg="ReplyStream not available"
SAM is communicating with the container ok, as evidenced by the START RequestId:… line. However, it’s failing to find the app.js to run.
I suspect it’s something to do with volume mapping.
I’ve tried setting --docker-volume-basedir to various values, but it seems to make no difference.
The “Remote Docker” section on this page https://github.com/thoeni/aws-sam-local#remote-docker suggests that “the project directory must be pre-mounted on the remote host where the Docker is running”. But how do I do that, when I’m not using docker desktop?
There are some similar sounding suggestions here https://github.com/aws/aws-sam-cli/issues/2837#issuecomment-879655277 which seem to involve modifying the dockerfile to mount a volume. However, I don’t have a dockerfile – SAM is just pulling the image automatically when invoked.
Any ideas? Is it even possible to invoke AWS Sam locally using a remote docker server as opposed to docker desktop?
The section “Step 3: Install Docker (optional)” of the SAM install guide https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install-windows.html describes setting up shared drives: “The AWS SAM CLI requires that the project directory, or any parent directory, is listed in a shared drive.” However, it’s evident that it’s expecting Docker Desktop, not docker running on a remote server.
Maybe it’s just not possible to invoke AWS SAM locally without Docker Desktop?
Ok, I've now realised where I went wrong.
At this point in the SAM log:
Mounting C:\Develop\AWS\sam-app\.aws-sam\build\HelloWorldFunction as /var/task:ro,delegated inside runtime container
AWS SAM is attempting to bind mount the C:\Develop\AWS\... directory on the Docker host to /var/task in the Docker container.
My mistake was thinking that it was mounting the actual directory on my local development machine.
I logged into the Docker host machine, and could see the directory structure had been created: /c/Develop/AWS/.... I transferred app.js from my local development machine to the Docker host's directory, and bingo - it now works. :-)
So, now the description in the AWS SAM developer guide for the --docker-volume-basedirmakes more sense:
The location of the base directory where the AWS SAM file exists. If Docker is running on a remote machine, you must mount the path where the AWS SAM file exists on the Docker machine, and modify this value to match the remote machine.
So I guess I need to create an SMB mapping from the application folder on my Windows development machine to a folder on the Linux Docker host, and ensure that the Docker host (Linux) folder gets used for running the application by setting --docker-volume-basedir accordingly.

Logstash Google Pubsub Input Plugin fails to load file and pull messages

I'm getting this error when trying to run Logstash pipeline with a configuration that is using google_pubsub on a docker container running in my production env:
2021-09-16 19:13:25 FATAL runner:135 - The given configuration is invalid. Reason: Unable to configure plugins: (PluginLoadingError) Couldn't find any input plugin named 'google_pubsub'. Are you sure this is correct? Trying to load the google_pubsub input plugin resulted in this error: Problems loading the requested plugin named google_pubsub of type input. Error: RuntimeError
you might need to reinstall the gem which depends on the missing jar or in case there is Jars.lock then resolve the jars with `lock_jars` command
no such file to load -- com/google/cloud/google-cloud-pubsub/1.37.1/google-cloud-pubsub-1.37.1 (LoadError)
2021-09-16 19:13:25 ERROR Logstash:96 - java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit
This seems to randomly happen when re-installing the plugin. I thought it's a proxy issue but I have the google domain enabled in the whitelist. Might be the wrong one / missing something. Still, doesn't explain the random failures.
Also, when I run the pipeline in my machine I get GCP events, but when I do it on a VM - no Pubsub messages are being pulled. Could it be a firewall rule blocking them?
The error message suggests there is a problem in loading the ‘google_pubsub’ input plugin. This error generally occurs when the input Pub/Sub plugin is not installed properly. Kindly ensure that you are installing the Logstash Plugin for Pub/Sub correctly.
For example, installing Logstash Plugin for Pub/Sub in a VM :
sudo -u root sudo -u logstash bin/logstash-plugin install logstash-input-google_pubsub
For a detailed demo refer to this community tutorial.

AWS EB docker-compose deployment from private registry access forbidden

I'm trying to get docker-compose deployment to AWS Elastic Beanstalk working, in which the docker images are pulled from a private registry hosted by GitLab.
The strange thing is that initial deployment works perfectly; It pulls the image from the private registry and starts the containers using docker-compose, and the webpage (served by Django) is accessible through the host.
Deploying a new version using the same docker-compose and the same docker image will result in an error while pulling the docker image:
2021/03/16 09:28:34.957094 [ERROR] An error occurred during execution of command [app-deploy] - [Run Docker Container]. Stop running the command. Error: failed to run docker containers: Command /bin/sh -c docker-compose up -d failed with error exit status 1. Stderr:Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Creating network "current_default" with the default driver
Pulling redis (redis:alpine)...
Pulling mysql (mysql:5.7)...
Pulling project.dockertest(registry.gitlab.com/company/spikes/dockertest:latest)...
Get https://registry.gitlab.com/v2/company/spikes/dockertest/manifests/latest: denied: access forbidden
2021/03/16 09:28:34.957104 [INFO] Executing cleanup logic
Setup
AWS Elastic Beanstalk 64bit Amazon Linux 2/3.2
Gitlab registry credentials are stored within a S3 bucket, with the filename .dockercfg and has the following content:
{
"auths": {
"registry.gitlab.com": {
"auth": "base64 encoded username:personal_access_token"
}
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.03.1-ce (linux)"
}
}
The repository contains a v3 Dockerrun.aws.json file to refer to the credential file in S3:
{
"AWSEBDockerrunVersion": "3",
"Authentication": {
"bucket": "gitlab-dockercfg",
"key": ".dockercfg"
}
}
Reproduce
Setup docker-compose.yml that uses a service with a private docker image (and can be pulled with the credentials setup in the dockercfg within S3)
Create a new applicatoin that uses the docker-platform.
eb init testapplication --platform=docker --region=eu-west-1
Note: region must be the same as the S3 bucket containing the dockercfg.
Initial deployment (this will succeed)
eb create testapplication-test --branch_default --cname testapplication-test --elb-type=application --instance-types=t2.micro --min-instance=1 --max-instances=4
The initial deployment shows that the image is available and can be started:
2021/03/16 08:58:07.533988 [INFO] save docker tag command: docker tag 5812dfe24a4f redis:alpine
2021/03/16 08:58:07.533993 [INFO] save docker tag command: docker tag f8fcde8b9ae2 mysql:5.7
2021/03/16 08:58:07.533998 [INFO] save docker tag command: docker tag 1dd9b65d6a9f registry.gitlab.com/company/spikes/dockertest:latest
2021/03/16 08:58:07.534010 [INFO] Running command /bin/sh -c docker rm `docker ps -aq`
Without changing anything to the local repository and the remote docker image on the private registry, lets do a redeployment which will trigger the error:
eb deploy testapplication-test
This will fail with the following output:
...
2021-03-16 10:02:28 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2021-03-16 10:02:29 ERROR Unsuccessful command execution on instance id(s) 'i-0dc445d118ac14b80'. Aborting the operation.
2021-03-16 10:02:29 ERROR Failed to deploy application.
ERROR: ServiceError - Failed to deploy application.
And logs of the instance show (/var/log/eb-engine.log):
Pulling redis (redis:alpine)...
Pulling mysql (mysql:5.7)...
Pulling project.dockertest (registry.gitlab.com/company/spikes/dockertest:latest)...
Get https://registry.gitlab.com/v2/company/spikes/dockertest/manifests/latest: denied: access forbidden
2021/03/16 10:02:25.902479 [INFO] Executing cleanup logic
Steps I've tried to debug or solve the issue
Rename dockercfg to .dockercfg on S3 (somewhere mentioned on the internet as possible solution)
Use the 'old' docker config format instead of the one generated by docker 1.7+. But later on I figured out that Amazon Linux 2-instances are compatible with the new format together with Dockerrun v3
Having an incorrectly formatted dockercfg on S3 will cause an error deployment regarding the misformatted file (so it actually does something with the dockercfg from S3)
Documentation
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/single-container-docker-configuration.html
I'm out of debug options, and I've no idea where to look any further to debug this problem. Perhaps someone can see what is going wrong here?
First of all, the issue describe above is a bug confirmed by Amazon. To get the deployment working on our side, we've contacted Amazon support.
They've a fix in place which should be released this month, so keep an eye on the changelog of the Elastic beanstalk platform: https://docs.aws.amazon.com/elasticbeanstalk/latest/relnotes/relnotes.html
Although the upcoming release should have the fix, there is a workaround available to get the docker-compose deployment working.
Elastic Beanstalk allows hook to be executed within the deployment, which can be used to fetch the .docker.cfg from a S3 bucket to authenticate with against the private registry.
To do so, create the following file and directories from the root of the project:
File location: .platform/hooks/predeploy/docker_login
#!/bin/bash
aws s3 cp s3://{{bucket_name_to_use}}/.dockercfg ~/.docker/config.json
Important: Add execution rights to this file (for example: chmod +x .platform/hooks/predeploy/docker_login)
To support instance configuration changes, please symlink the hooks directory to confighooks:
ln -s .platform/hooks/ .platform/confighooks/
Updating configuration requires the .dockercfg credentials to be fetched too.
This should enable continuous deployments to the same EB-instance without the authentication errors, because the hook will be execute before the docker image pulling.
Some background:
The docker daemon reads credentials from ~/.docker/config by default on traditional linux systems. On the initial deploy this file will exist on the Elastic Beanstalk instance. On the next deployment this file is removed. Unfortunately, on the next deployment the .dockercfg is not refetched, therefor the docker daemon does not have the correct credentials to authenticate with.
I was dealing the same errors while trying to pull images from a privately hosted GitLab instance. I was able to resolve them by including the email address that was associated with the generated token found in the auth field of the .dockercfg file.
The following file format worked for me:
"registry.gitlab.com" {
"auth": "base64 encoded username:personal_access_token",
"email": "email for personal access token"
}
In my case I used a Project Access Token, which has an e-mail address associated with it once it is created.
The file format in the Elastic Beanstalk documentation for the authentication file here, indicates that this is the required file format, though the versions that it says this format is required for are almost certainly outdated, since we are running Docker ^19.

Docker executable not found in PATH when using AWS batch/ECS

I am trying to run a simple Dockerized Python script with AWS batch.
Is there a problem with my Docker image?
I have locally built the Docker image and it runs fine locally. I pushed the image to a AWS repository, and pulling this remote image to my local machine also runs correctly.
Problem
I have setup my compute env, job queue, and job definition, but I get this error
CannotStartContainerError: Error response from daemon:
OCI runtime create failed: container_linux.go:370:
starting container process caused:
exec: "docker": executable file not found in $PATH: unknown
when I run
["docker","run","-t","111111111111.dkr.ecr.us-region-X.amazonaws.com/myimage:latest","python3","hello_world.py","--MSG","ok"]
Is Docker installed?
I am using the ECS_AL2 image type. When I start a EC2 with this AMI and ssh into it, I can see that Docker is already installed. docker run works fine for instance.
Is there a (generic) problem with my compute env, job queue, or job def?
When I instead try to run the command echo hello this works fine.
Appreciate any advice/help you can provide.
UPDATE - ANSWER
#samtoddler helped me to realize that I only needed
["python3","hello_world.py","--MSG","ok"]
in the Command statement
this error
CannotStartContainerError: Error response from daemon:
that means it is coming from docker daemon, so docker is doing its job.
Seems like you have some trouble with your docker image, how it is packaged and how you trying to pass all those vars.
Please check Docker Image CMD section on how to use ENTRYPOINT and CMD.
There is some explanation in this question docker-oci-runtime-create-failed-container-linux-go349-starting-container-pro