Wildfly 10 restart issue on AWS EC2 - amazon-web-services

I am running my Wildfly 10.1.0 server on Linux OS on Amazon EC2 instance. I have written start and stop scripts for the server. Whenever I stop my server and re-start after some time I get the following exception -
WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "rapid.ear")]) - failure description: "WFLYSRV0137: No deployment content with hash dd66eee901c4bf79dd6659873df918e1b639bc1b is available in the deployment content repository for deployment 'rapid.ear'. This is a fatal boot error. To correct the problem, either restart with the --admin-only switch set and use the CLI to install the missing content or remove it from the configuration, or remove the deployment from the xml configuration file and restart."
When I remove the entry for that WAR from standalone.xml I am able to restart the server, but I need a more permanent solution.
The start script written is -
nohup /data/wildfly-10.1.0.Final/bin/standalone.sh -Djavax.net.ssl.trustStore="/usr/java/jdk1.8.0_121/jre/lib/security/jssecacerts" --server-config=standalone.xml &
And the stop script is -
sh /data/wildfly-10.1.0.Final/bin/jboss-cli.sh --connect command=:shutdown

It may not be quite as efficient in terms of I/O but if you've got a standalone instance I've just taken advantage of the deployment scanner. I have:
<subsystem xmlns="urn:jboss:domain:deployment-scanner:2.0">
<deployment-scanner name="myapp" path="/home/wildfly/sites/www.mysite.tld" scan-interval="60000" auto-deploy-exploded="true"/>
</subsystem>
in my standalone-full.xml (you may or may not need the "-full" part). I then deploy my webapp to "/home/wildfly/sites/www.mysite.tld" and can update it as needed. The code I show only reads the directory once a minute so it isn't terrible on I/O.
Again, your deployment may be different than mine.

Related

My GKE pods stoped with error "no command specified: CreateContainerError"

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.
Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

How to invoke AWS SAM locally using remote docker (as opposed to docker desktop)?

I have AWS SAM installed on a Windows machine. I have followed the instructions here https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-getting-started-hello-world.html to create a test Hello World application.
I have docker server running on a separate (Linux) VM. How do I invoke AWS SAM locally?
I have tried the following:
sam local start-api --container-host-interface 0.0.0.0 --container-host 192.168.28.168
where 192.168.28.168 is the Linux VM where docker server is running. (I.e. different to the Windows machine I’m developing on).
However, I get “Error: Cannot find module”:
PS C:\Develop\AWS\sam-app> sam local start-api --container-host-interface 0.0.0.0 --container-host 192.168.28.168
Mounting HelloWorldFunction at http://127.0.0.1:3000/hello [GET]
You can now browse to the above endpoints to invoke your functions. You do not need to restart/reload SAM CLI while working on your functions, changes will be reflected instantly/automatically. You only need to restart SAM CLI if you update your AWS SAM template
2021-09-24 07:50:10 * Running on http://127.0.0.1:3000/ (Press CTRL+C to quit)
Invoking app.lambdaHandler (nodejs14.x)
Skip pulling image and use local one: amazon/aws-sam-cli-emulation-image-nodejs14.x:rapid-1.27.2.
Mounting C:\Develop\AWS\sam-app\.aws-sam\build\HelloWorldFunction as /var/task:ro,delegated inside runtime container
START RequestId: bd6b8177-56bb-4464-8ead-8c46809e6c6c Version: $LATEST
2021-09-24T06:50:35.674Z undefined ERROR Uncaught Exception {"errorType":"Runtime.ImportModuleError","errorMessage":"Error: Cannot find module 'app'\nRequire stack:\n- /var/runtime/UserFunction.js\n- /var/runtime/index.js","stack":["Runtime.ImportModuleError: Error: Cannot find module 'app'","Require stack:","- /var/runtime/UserFunction.js","- /var/runtime/index.js"," at _loadUserApp (/var/runtime/UserFunction.js:100:13)"," at Object.module.exports.load (/var/runtime/UserFunction.js:140:17)"," at Object.<anonymous> (/var/runtime/index.js:43:30)"," at Module._compile (internal/modules/cjs/loader.js:1085:14)"," at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)"," at Module.load (internal/modules/cjs/loader.js:950:32)"," at Function.Module._load (internal/modules/cjs/loader.js:790:14)"," at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)"," at internal/main/run_main_module.js:17:47"]}
time="2021-09-24T06:50:35.691" level=panic msg="ReplyStream not available"
SAM is communicating with the container ok, as evidenced by the START RequestId:… line. However, it’s failing to find the app.js to run.
I suspect it’s something to do with volume mapping.
I’ve tried setting --docker-volume-basedir to various values, but it seems to make no difference.
The “Remote Docker” section on this page https://github.com/thoeni/aws-sam-local#remote-docker suggests that “the project directory must be pre-mounted on the remote host where the Docker is running”. But how do I do that, when I’m not using docker desktop?
There are some similar sounding suggestions here https://github.com/aws/aws-sam-cli/issues/2837#issuecomment-879655277 which seem to involve modifying the dockerfile to mount a volume. However, I don’t have a dockerfile – SAM is just pulling the image automatically when invoked.
Any ideas? Is it even possible to invoke AWS Sam locally using a remote docker server as opposed to docker desktop?
The section “Step 3: Install Docker (optional)” of the SAM install guide https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install-windows.html describes setting up shared drives: “The AWS SAM CLI requires that the project directory, or any parent directory, is listed in a shared drive.” However, it’s evident that it’s expecting Docker Desktop, not docker running on a remote server.
Maybe it’s just not possible to invoke AWS SAM locally without Docker Desktop?
Ok, I've now realised where I went wrong.
At this point in the SAM log:
Mounting C:\Develop\AWS\sam-app\.aws-sam\build\HelloWorldFunction as /var/task:ro,delegated inside runtime container
AWS SAM is attempting to bind mount the C:\Develop\AWS\... directory on the Docker host to /var/task in the Docker container.
My mistake was thinking that it was mounting the actual directory on my local development machine.
I logged into the Docker host machine, and could see the directory structure had been created: /c/Develop/AWS/.... I transferred app.js from my local development machine to the Docker host's directory, and bingo - it now works. :-)
So, now the description in the AWS SAM developer guide for the --docker-volume-basedirmakes more sense:
The location of the base directory where the AWS SAM file exists. If Docker is running on a remote machine, you must mount the path where the AWS SAM file exists on the Docker machine, and modify this value to match the remote machine.
So I guess I need to create an SMB mapping from the application folder on my Windows development machine to a folder on the Linux Docker host, and ensure that the Docker host (Linux) folder gets used for running the application by setting --docker-volume-basedir accordingly.

Logstash Google Pubsub Input Plugin fails to load file and pull messages

I'm getting this error when trying to run Logstash pipeline with a configuration that is using google_pubsub on a docker container running in my production env:
2021-09-16 19:13:25 FATAL runner:135 - The given configuration is invalid. Reason: Unable to configure plugins: (PluginLoadingError) Couldn't find any input plugin named 'google_pubsub'. Are you sure this is correct? Trying to load the google_pubsub input plugin resulted in this error: Problems loading the requested plugin named google_pubsub of type input. Error: RuntimeError
you might need to reinstall the gem which depends on the missing jar or in case there is Jars.lock then resolve the jars with `lock_jars` command
no such file to load -- com/google/cloud/google-cloud-pubsub/1.37.1/google-cloud-pubsub-1.37.1 (LoadError)
2021-09-16 19:13:25 ERROR Logstash:96 - java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit
This seems to randomly happen when re-installing the plugin. I thought it's a proxy issue but I have the google domain enabled in the whitelist. Might be the wrong one / missing something. Still, doesn't explain the random failures.
Also, when I run the pipeline in my machine I get GCP events, but when I do it on a VM - no Pubsub messages are being pulled. Could it be a firewall rule blocking them?
The error message suggests there is a problem in loading the ‘google_pubsub’ input plugin. This error generally occurs when the input Pub/Sub plugin is not installed properly. Kindly ensure that you are installing the Logstash Plugin for Pub/Sub correctly.
For example, installing Logstash Plugin for Pub/Sub in a VM :
sudo -u root sudo -u logstash bin/logstash-plugin install logstash-input-google_pubsub
For a detailed demo refer to this community tutorial.

Deployment on AWS Elastic Beanstalk with Docker fails

I'm developing a web application with Play framework and I'm running it on AWS Elastic Beanstalk using a single docker container and a load balancer. Normally, everything is running fine, but when I rebuild the whole environment I get the following error:
Command failed on instance. Return code: 6 Output: (TRUNCATED)... in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11 nginx: [emerg] host not found in upstream "docker" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:24 nginx: configuration file /etc/nginx/nginx.conf test failed.
When I log into the EC2 I can see that no docker image is running and therefore the Nginx server cannot start. I cannot see any other error in the logs (or maybe I don't know where to look). The strange thing is that the same version worked fine before rebuilding the environment.
I'm using the following Dockerfile for the deployment:
FROM java
COPY <app_folder> /opt/<app_name>
WORKDIR /opt/<app_name>
CMD [ "/opt/<app_name>/bin/<app_name>", "-mem", "512", "-J-server" ]
EXPOSE 9000
Any ideas what the problem could be or where to check for more details?
I had this same problem. elasticbeanstalk-nginx-docker-proxy.conf is referring to proxy_pass http://docker but the definition of that is missing. You need to add something like
# List of application servers
upstream docker {
server 127.0.0.1:8080; # your app
}
(Make sure it's outside of the server directive.)
I have just been working through the same challenge (deploying an updated Docker image to Elastic Beanstalk). And it depends on what you want to do exactly, but what I found out is that (once you have the eb cli setup) you can just use the eb deploy command to push out your code changes without worrying about the image at all.
Granted you'd still want to push your image up to your repo for sharing purposes (with other developers), OR if you actually need to change the environment configuration for some reason... but if you're just looking to push code look into eb deploy
As far as the specifics of your error unfortunately I can't be of much help there. Good luck!

Application Logs on Cloudfoundry don't show up

I have a problem with my application logs on my Cloudfoundry deployment.
I've deployed Cloudfoundry in a something minimized design based on the tiny-aws deployment of https://github.com/cloudfoundry-community/cf-boshworkspace.
I further minimized the deployment and put everything from the VMs "api", "backbone", "health" and "services" together on the api-machines.
So I have the following VMs:
api (2 instances)
data (1 instance)
runner (2 instances)
haproxy (1 public and 1 private proxy)
Cloudfoundry version is 212.
The deployment itself seems to work. I can deploy apps and they start up.
But the logs from my applications don't show up when I run
"cf logs my-app --recent"
I've tried several log-configurations in my spring-boot-app.
standard without modifications which should log to STDOUT according to spring-boot documentation
exlicitly set a log4j.properties file which was configured to log to STDOUT as well
a log4j-2 configuration for logging on STDOUT
a spring-boot configuration which logs to a file
In the last configuration, the file was created and my logs was shown when I ran "cf files my-app log/my-app.log"
I tried to debug where my logs are lost, but I couldn't find something.
The dea_logging_agent seems to run and has the correct NATS location configured, the dea itself too.
Loggregator seems to run well on the api-host too and seems to be connected to NATS too.
So my question is: In which locations should I search to find out where my logs go?
Thank you very much.