Elastic Beanstalk stalling after running out of memory

Elastic Beanstalk stalling after running out of memory - amazon-web-services

I have an Elastic Beanstalk instance which then runs out of memory starts giving 500 errors and the health goes to degraded which is expected behavior. Now when the memory is released server health is back to normal the HTTPS requests are stalling. There is no CPU or memory activity on the server but server still fails to load 2 out of 5 requests. The issue is resolved once i restart the Elastic Beanstalk instance.
I checked the error logs but there are no records seems as if the requests are not hitting the server they are being blocked at the load balancer. Is any one facing this issue? I tried running few different applications on the Elastic Beanstalk instance but results are the same.
Any suggestions or pointing me to docs will be appreciated as i didn't find anything in their docs about this.
Thanks

It is possible that some important process was OOM killed, and has not started back up properly. You can look at this question for help on that: Finding which process was killed by Linux OOM killer
It may be possible to simply reboot the server, and everything may start back up properly again.
The best solution is to not run out of memory in the first place. If you are unable to upgrade the server, perhaps due to cost reasons, then consider adding a swap file.
I have an Elastic Beanstalk application in which I add a swap file using an ebextension.
# .ebextensions/01-swap.config
commands:
"01-swap":
command: |
dd if=/dev/zero of=/var/swapfile bs=1M count=512
chmod 600 /var/swapfile
mkswap /var/swapfile
swapon /var/swapfile
echo "/var/swapfile none swap sw 0 0" >> /etc/fstab
test: test ! -f /var/swapfile
https://github.com/stefansundin/rssbox/blob/1e40fe60f888ad0143e5c4fb83c1471986032963/.ebextensions/01-swap.config

Related

Google Cloud Ops Agent

I am having an issue where Google Cloud Ops Agent logging gathers a lot of data and fills up my entire debian server hard drive in about 3 weeks due to the ever increasing size of the log file.
I do not want to increase the size of my server hard drive.
Does anyone know how to configure Google Cloud Ops Agent so that it only retains log data for the previous 7 days ?
EDIT: Google Cloud Ops Agent log file is stored in directory below
/var/log/google-cloud-ops-agent/subagents/logging-module.log

I faced the same issue recently while using agent 2.11.0. And it's not just an enormous log file, it's also a ridiculous CPU usage! Check it out in htop.
If you open the log file you'll see it spamming errors about buffer chunks. Apparently, they got broken smh, so the agent can't read them and send away. Thus, high IO and CPU usage.
The solution is to stop the service:
sudo service google-cloud-ops-agent stop
Then clear all buffer chunks:
sudo rm -rf /var/lib/google-cloud-ops-agent/fluent-bit/buffers/
And delete log file if you want:
sudo rm -f /var/log/google-cloud-ops-agent/subagents/logging-module.log
Then start the agent:
sudo service google-cloud-ops-agent start
This helped me out.
Btw this issue is described here and it seems that Google "fixed" it since 2.7.0-1. Whatever they mean by it since we still faced it...

AWS EC2 User Data not working (Tried Installing and starting httpd via User Data)

The Following is my EC2 User Data:
#!/bin/bash
sudo yum update -y
sudo yum install -y httpd
sudo systemctl start httpd
sudo systemctl enable httpd
In Security Group SSH 22 Port and HTTP 80 Port is Open.
Yet when I try accessing http://public_ip_of_instance the HTTP Apache page doesn't load.
Also, on the Instance Apache is not installed when I checked sudo systemctl status httpd.
I then manually tried it on the EC2 Server and it worked. Then I removed it through yum remove as I wanted to see whether User Data works.
I stopped the Instance and started again but I observed that the User Data Script doesn't work as I am unable to access http page through browser and also on Instance http is not installed.
Where is the actual issue? Some months back this same thing worked on another instance I remember.

Your user data is correct. Whatever is happening with your website is not due to the user data code that you provided.
There could be many reasons it does not work. Public IP of the instance has changed, as always happens when you stop/start the instance. Instance may have per-existing software that clashes with httpd.

Here's some general advice on running UserData once or each startup.
Short answer as John mentioned in the comments EC2's only run the UserData (aka Bootstrap) script once on initalization.
The user data Bash/Powershell is Infrastructure-As-Code. You deploy the script and it installs and configures the machine.
This causes confusion with everyone starting AWS. When you think about it though it doesn't make sense to run the UserData script each time when the PCs already been configured.
What people do often instead is make "Golden Images" (aka Amazon Machine Images - AMI's) of pre-setup EC2s, typically for PCs that take long time to install/configure. The beauty of this is you can setup AutoScaleGroups to use the images which saves any long installation during a scale up event.
Pro Tip: When developing an UserData script run through and test it manually on the EC2. Trust me its far quicker than troubleshooting unattended EC2 UserData errors.
Long answer: you can run the UserData on each boot of the machine using Mime multi-part file. A mime multi-part file allows your script to override how frequently user data is run in the cloud-init package.
https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/

For all those who will run into this problem, first of all check the log with the command:
sudo cat /var/log/cloud-init-output.log
then if you notice connection errors to the various repositories, the reason is because you don't have an internet connection. However, if once inside your EC2 you manage to launch the update and install commands, then the reason why they fail in the UserData is because your EC2 takes a few seconds to get the Internet connection and executes the commands before having it. So to solve this problem, just add this command after #!/bin/bash
#!/bin/bash
until ping -c1 8.8.8.8 &>/dev/null; do :; done
sudo yum update -y
...
This will prevent your EC2 from executing commands before an internet connection is established

Allow a bash script to run at boot in AWS Centos 7 instance

I need to create AWS CentOS 7 instance images for a customer, and need it to automatically send the ip and instance id to our AWS server every time the instance boots. For example, this is the very basic test version of the script I need to run:
#!/bin/bash
$serverIP=""
curl "https://$serverIP"/myphp.php?id='sentid'&ip='sentip'"
If the script is run directly, it works fine and is received by the server and processed there. But I can't get it to run at boot. I cannot put the script in the "User Data" directly due to security concerns as the customer can then see it easily, it needs to be in a script in the filesystem of the image.
I've tried several things that work fine on a physical Linux server, but not on AWS. I know profile.d runs every time someone logs in but over-sending like that is fine.
/etc/profile.d/myscript.sh
This stops the AWS instance from booting. Even just
#!/bin/bash/
echo "hello world"
prevents it from booting. The instance starts, but when you go to ssh into it you get 'Network Error: connection timed out', which is the standard error if you put a wrong ip in, or upset it by leaving a service like httpd enabled.
However, a blank bash script with just #!/bin/bash will allow the instance to start. Removing the script via user data usually makes it boot, sometimes it just dies.
The first thing I tried was crontab. I did:
crontab -e
#reboot /var/ook/myscript.sh
systemctl enable crond.service
But the instance wouldn't start. So I put "systemctl disable crond.service" in the User Data and one booted, but another still stayed dead. Myscript.sh was just another echo "doob" >> file which worked fine when run directly.
I tried putting in /etc/systemd/system/my-startup.service:
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/var/ook/writedood.sh
[Install]
WantedBy=multi-user.target
then:
systemctl enable my-startup.service
But this did nothing. My script "writedood.sh" was just echo "doob" >> ./file.txt ensuring file.txt was chmod 777. At least it didn't prevent the instance from starting.
To give context, an instance won't start if httpd is left enabled on shutdown, but will if you disable it in User Data.
I wanted to have a go at putting something in init.d but I'm not sure how to simply tell it to run a script once in the background, and given the plethora of success I've had so far with the instance not restarting, I'm not holding out much hope that that would work.
Thanks in advance!
EDIT::: I realised that sometimes AWS EC2 Instances Console is causing the problem where I can't ssh in after stopping and starting. It blanks the public ipv4 address when I click stop, but when I start, it puts the old address up and hangs. If I refresh the page, or uncheck/check the instance; the ip changes to the new address. This has caused much consternation.

Crontab worked if I placed the scripts and output file in different folders. It's very finicky; any errors, such as it not being able to write to the output file, and the instance won't start. I put startscript.sh in /usr/local/src, and output.out to /tmp/ to ensure there were no permissions problems, and now the instance starts and runs the script on boot.
I then realised that sometimes AWS EC2 Instances Console is causing the problem where I can't ssh in after stopping and starting. It blanks the public ipv4 address when I click stop, but when I start, it puts the old address up and hangs. If I refresh the page, or uncheck/check the instance; the ip changes to the new address. This has caused much consternation.

Wildfly 10 restart issue on AWS EC2

I am running my Wildfly 10.1.0 server on Linux OS on Amazon EC2 instance. I have written start and stop scripts for the server. Whenever I stop my server and re-start after some time I get the following exception -
WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "rapid.ear")]) - failure description: "WFLYSRV0137: No deployment content with hash dd66eee901c4bf79dd6659873df918e1b639bc1b is available in the deployment content repository for deployment 'rapid.ear'. This is a fatal boot error. To correct the problem, either restart with the --admin-only switch set and use the CLI to install the missing content or remove it from the configuration, or remove the deployment from the xml configuration file and restart."
When I remove the entry for that WAR from standalone.xml I am able to restart the server, but I need a more permanent solution.
The start script written is -
nohup /data/wildfly-10.1.0.Final/bin/standalone.sh -Djavax.net.ssl.trustStore="/usr/java/jdk1.8.0_121/jre/lib/security/jssecacerts" --server-config=standalone.xml &
And the stop script is -
sh /data/wildfly-10.1.0.Final/bin/jboss-cli.sh --connect command=:shutdown

It may not be quite as efficient in terms of I/O but if you've got a standalone instance I've just taken advantage of the deployment scanner. I have:
<subsystem xmlns="urn:jboss:domain:deployment-scanner:2.0">
<deployment-scanner name="myapp" path="/home/wildfly/sites/www.mysite.tld" scan-interval="60000" auto-deploy-exploded="true"/>
</subsystem>
in my standalone-full.xml (you may or may not need the "-full" part). I then deploy my webapp to "/home/wildfly/sites/www.mysite.tld" and can update it as needed. The code I show only reads the directory once a minute so it isn't terrible on I/O.
Again, your deployment may be different than mine.

Can I configure Linux swap space on AWS Elastic Beanstalk?

Can I configure Linux swap space for an AWS Elastic Beanstalk environment?
I don't see an option for it in the console. From looking at /proc/meminfo on my running instances in my environment MemAvailable is looking quite low despite there being quite high Inactive values. I suspect there are a few dormant background processes that it would do no harm to page out, and would free up a non-trivial portion of the limited physical memory on the t2.nano I'm using.

I figured out how to do this using the .ebextensions config folder in my Tomcat web app.
Add a file .ebextensions/swap.config:
container_commands:
add-swap-space:
command: "/bin/bash .ebextensions/scripts/add-swap-space.sh"
ignoreErrors: true
Add a file .ebextensions/scripts/add-swap-space.sh:
#!/bin/bash
set -o xtrace
set -e
if grep -E 'SwapTotal:\s+0+\s+kB' /proc/meminfo; then
echo "Positively identified no swap space, creating some."
dd if=/dev/zero of=/var/swapfile bs=1M count=512
/sbin/mkswap /var/swapfile
chmod 000 /var/swapfile
/sbin/swapon /var/swapfile
else
echo "Did not confirm zero swap space, doing nothing."
fi
More docs: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
After a while running this allowed 150MiB to be swapped out on my t2.nano instance which runs the Elastic Beanstalk Java Tomcat platform with default heap options. From what I can see there is no ongoing paging while the application runs. It looks like some dormant data has been pushed to swap and the page cache is significantly larger (up from 30MiB to 180MiB).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js