Elastic beanstalk - Rolling deployment issue. Unexpected downtime

Elastic beanstalk - Rolling deployment issue. Unexpected downtime - amazon-web-services

We just migrated our Elastic beanstalk environments from PHP 7.3 running on 64bit Amazon Linux to PHP 7.4 running on 64bit Amazon Linux 2 and we are seeing the following errors:
When deploying code to the environment with a Rolling deployment policy - we get a 3-4 seconds 502 bad gateway before servers starts working again. We did not see this downtime with previous generation of Linux.
Also: Application load balancer clear all sessions and signs out all users even though stickiness is enabled with a Load balancer generated cookie. We did not see this stickiness-issue with previous generation of Linux.
Happening on both Apache and Nginx.
Any ideas on how to resolve this?

I don't know about the Bad Gateway issue, but I recently ran into this loss of sessions issue. The sessions are lost because Amazon Linux 2 now uses the PHP-FPM service through systemd to host PHP. Session tracking in the default PHP/ElasticBeanstalk configuration is done by the use of files in the /tmp directory. However, systemd's "PrivateTmp" feature is enabled, which creates a unique directory for the PHP-FPM service to use when running. As soon as the PHP-FPM service is stopped, systemd deletes this special "private" /tmp, which deletes all the session files.
Whenever PHP ElasticBeanstalk deploys a new version, this PHP-FPM service is stopped and restarted, resulting in the loss of sessions.
There are a couple options to address this issue:
-> Configure PHP to use something like memcached/redis/etc to manage sessions, instead of using the filesystem. This is probably the most secure solution.
Or,
-> Configure your Amazon Linux 2 ElasticBeanstalk instances to handle these session files in the /tmp directory proper, instead of the "private" tmp directory provided by systemd.
This can be achieved by adding the following post-deploy configuration script into your project under the path: .platform/hooks/postdeploy/phpfpm_noprivatetmp.sh
#!/bin/bash -e
# change PrivateTmp from true to false, then reload/restart the systemd service
sed -i 's/PrivateTmp=true/PrivateTmp=false/' /usr/lib/systemd/system/php-fpm.service
# wait a moment...
sleep 2
sudo systemctl daemon-reload
# wait a moment...
sleep 2
sudo systemctl restart php-fpm.service
This will disable the "PrivateTmp" feature in systemd, causing the session files to be stored in the "real" /tmp directory where they won't get deleted automatically. Deploying new versions of your site will no longer cause everyone to get logged out.

Related

AWS: Create a new environment (tomcat-single-instance): .ebextensions with SSL certificate - fails to start

Elastic Beanstalk: Create a new environment: .ebextensions with SSL certificate fails to start (tomcat-single-instance)
I am trying to create a new environment with the current production WAR package.
New instance deployment fails and comes up with "Green" status. We originally followed this sample to create the .ebextensions (https://s3.amazonaws.com/elasticbeanstalk-single-instance-ssl-demo/tomcat-single-instance.zip) and extended as described here https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/https-singleinstance-tomcat.html .
New Platform: Managed, Tomcat 8.5 with Java 8 on 64bit Amazon Linux
Platform version: 3.4.1(Recommended)
Error:
httpd24-tools conflicts with httpd-tools-2.2.34-1.16.amzn1.x86_64
mod24_ssl conflicts with 1:mod_ssl-2.2.34-1.16.amzn1.x86_64
httpd24 conflicts with httpd-2.2.34-1.16.amzn1.x86_64
To resolve this error, I replaced
packages:
yum:
mod_ssl : []
with
packages:
yum:
mod24_ssl.x86_64 : []
But that caused this error:
Httpd configuration detected in the '.ebextensions/httpd' directory. AWS Elastic Beanstalk will no longer manage the httpd configuration for this environment.
Executing: /usr/sbin/apachectl -t -f /var/elasticbeanstalk/staging/httpd/conf/httpd.conf
httpd: Syntax error on line 21 of /var/elasticbeanstalk/staging/httpd/conf/httpd.conf: Include/IncludeOptional: No matches for the wildcard '*.conf' in '/etc/httpd/conf.d/elasticbeanstalk', failing
Failed to execute '/usr/sbin/apachectl -t -f /var/elasticbeanstalk/staging/httpd/conf/httpd.conf'
Failed to execute '/usr/sbin/apachectl -t -f /var/elasticbeanstalk/staging/httpd/conf/httpd.conf' (Executor::NonZeroExitStatus)
AWS is asking to replace the current production server (Amazon Linux/2.3.1) without delay as it is "Retired". I have posted this issue on AWS Forum as well. Please help.

As indicated in the AWS documentation:
Starting with Tomcat platform version 3.0.0 configurations, which were released with the Java with Tomcat platform update on May 24, 2018, Apache 2.4 is the default proxy of the Tomcat platform.
After digging into the problem, as can be seeing for the comments and the companion chat, the actual solution for the problem was either create or clone the existing environment, with the idea of being able to use a fresh beanstalk environment for Amazon Linux 3.4.2, without any customization.
Then, in order to avoid the mentioned problem with SSL, the .ebextensions directory should only include a convenient ssl.conf and the environment.config script provided in the sample zip file indicated in the question, without the packages section because mod_ssl is already installed in the beanstalk image.
In this specific case, there were some additional problems related with the creation of the files required for logging. After adjusting the path to one in which the application has the ability to write, the default for Tomcat in Beanstalk, /var/logs/tomcat8, everything seems to work properly.

Save yourself pain. Do not configure SSL in your Tomcat Server, do it on an AWS Elastic Load Balancer ELB.

AWS Elastic Beanstalk with docker incorrect version

I'm deploying a docker image from Github to AWS elastic beanstalk using travis. That part goes OK, the actual deployment exits with 0 and there is a .zip file in the S3 bucket.
The issue is that, since this is my first time using AWS I created the app using the Sample Application since the code is deployed from Github, and after the deployment I get the health status as degraded (red exclamation sign) with this message:
ERROR
During an aborted deployment, some instances may have deployed the new application version. To ensure all instances are running the same version, re-deploy the appropriate application version.
If I go to Causes I find this:
Application deployment failed at 2020-05-01T16:01:58Z with exit status 1 and error: Engine execution has encountered an error.
Incorrect application version "travis-e55e05342a8cc16f3f28f8e184735667a9531ffa-1588311901" (deployment 4). Expected version "Sample Application" (deployment 1).
I even deleted the sample application and re-deployed the one that was uploaded and got that particular error. As you can see in the last message I've deployed this 3 times already, getting the same result.
Finally I downloaded the zip file from the S3 bucket and I found inside basically the src and public folders along with all the files in the root folder such as package.json, .gitignore all the docker files, etc.
EDIT
I created two separate repos in github to test this.
The first repo is a static page in a Docker container, quite simple. I create an environment in EB and start everything with the sample app. Then I push the changes to github, travis does it's thing and deploys the app to AWS. This works fine and the app's env is updated with no errors. This is the repo:
https://github.com/rhernandog/docler-static-page-aws
The second repo is a simple react app. Same procedure, create the environment in EB with the sample app. Push the code to github, travis does it's thing and deploys to AWS. This fails and I keep getting the same error:
Environment health has transitioned from Info to Degraded. Command failed on all
instances. Incorrect application version found on all instances. Expected version
"Sample Application" (deployment 1). Application update failed 1 second ago and
took 2 minutes.
This is the repo for the react app:
https://github.com/rhernandog/react-docker-awseb
In terms of Docker, everything works fine in my local machine.
EDIT 2
Based on #stefansundin suggestion I re-deployed the app to EB and check the logs. I ended looking at the full logs for more information and found this:
/var/log/cfn-hup.log
2020-05-14 17:07:42,605 [WARNING] Action for aws-eb-command-handler exited with 1, returning FAILURE
The only place where I found an error was in the engine log file:
/var/log/eb-engine.log
2020/05/14 17:07:42.514601 [INFO] Executing instruction: Docker Specific Build Application
2020/05/14 17:07:42.514605 [INFO] start build docker app
2020/05/14 17:07:42.514615 [INFO] fetch image name
2020/05/14 17:07:42.514639 [INFO] authenticate with ECR if the image is in an ECR repo
2020/05/14 17:07:42.514644 [INFO] pull docker image if update is not false in dockerrun.aws.json
2020/05/14 17:07:42.514657 [INFO] Running command /bin/sh -c docker pull node:12-alpine AS builder
2020/05/14 17:07:42.558923 [ERROR] "docker pull" requires exactly 1 argument.
So basically this is complaining about this in the dockerfile: FROM node:12-alpine AS builder. You can see the whole file in the repo: https://github.com/rhernandog/react-docker-awseb/blob/master/Dockerfile
The point is: Why this doesn't happen in my local machine? And how can I actually get the files from the build command and copy them to the nginx folder?
That is actually the only error I found in the log files.

I solved the issue here:
AWS Elastic Beanstalk Docker Does not support Multi-Stage Build
it is a stage-naming problem of multi-stage Dockerfile. Just use an Unamed one

I also got a similar error in my node app:
Incorrect application version "travis-e55e05342a8cc16f3f28f8e184735667a9531ffa-1588311901" (deployment 4). Expected version "Sample Application" (deployment 1)
What turned out to be an issue with my building and deployment scripts were corrected (debugged in Jenkins) the application successfully deploys in beanstalk with no error.
Turns out the issue was not with Beanstalk or app version but with the build mechanism. Something to look into when nothing else works :)

I had the same issue for java app in docker container.
I tried all the recommendations from this topic, links from this topic and nothing helped.
In the end, the following action helped:
Enable enhanced health panel https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced-enable.html#health-enhanced-enable-console
Go to the extended panel of the desired environment
Select the instance that crashed due to this "version" issue and click reboot
Additionally:
In one of the cases, I had to delete all previous versions (section on the left panel) and push a new one and only after that make the above recommendations.
Also make sure you have sufficient rights to deploy (codepipeline/deployment)

AWS Docs say that
To solve this issue, start another deployment. You can redeploy a previous version that you know works, or configure your environment to ignore health checks during deployment and redeploy the new version to force the deployment to complete.
You can also identify and terminate the instances that are running the wrong application version. Elastic Beanstalk will launch instances with the correct version to replace any instances that you terminate. Use the EB CLI health command to identify instances that are running the wrong application version.
Can you try to delete the instances that runs your applications and start a fresh install?
Also, you can use CodePipeline to deploy your codes to Elastic Beanstalk, you can use your S3 folder for the source stage and skip the build process if your code is build on travis and deploy using the deploy stage to install your new app to your Elastic Beanstalk. There might be some misconfiguration while installing the new app to your environment.
I suggest you to terminate your instances and start new instances sorry if I got your question wrong.

I haven't used Docker on Elastic Beanstalk. When my Ruby on Elastic Beanstalk deployments fail, I find that I usually find the problem if I request the 100 last lines from the logs. If you navigate to "Logs" -> "Request Logs" -> "Last 100 Lines", that may help you.
If that fails, I SSH in to the instance and look in the logs in /var/log. Maybe docker ps and docker logs may help you.

While creating a new webserver environment on platform branch select "Docker running on 64bit Amazon Linux" it will work.

Django API works locally but not when deployed on Elastic Beanstalk

I'm working with Django Rest Framework and Django Rest Framework JWT, but I'm running into an issue in regards to local behavior vs external behavior.
When I make a POST request to get a JWT token, everything works as desired both locally and on my EC2 instance. However, once I have the token, when I make a request to my server that requires authentication, only my local server returns the expected response. On my deployed server, I get the following error:
{"detail":"Authentication credentials were not provided."}
What I've tried so far:
Editing http.conf by sshing into my server and enabling WSGIPathAuthorization. (saw a similar post here and tried the solution)
What could be causing this behavior? My local machine and my deployed code are identical, leading me to believe that this has something to do with server-side configuration.
All help is appreciated. Thanks!

You mentioned ElasticBeanstalk.
You can add this to your container commands which will be executed during deployment.
01_wsgipass:
command: 'echo "WSGIPassAuthorization On" >> ../wsgi.conf'

Simply restarting Apache after enabling WSGPPathAuthorization fixed my error. For those of you who encounter something similar, here's what I did:
SSH into the server
Navigate to where your http.conf file is stored (in Apache, this is usually
etc/httpd/http.conf)
Edit http.conf, adding WSGIPathAuthorization On.
Restart Apache by running sudo service httpd restart.

This problem usually occurs when you do configure wsgi with apache on EC2 instance.
Bascially its the problem in apache configuration,
it has nothing to do with AWS EC2.
As apache bydefault do not process Authorization headers, so in order to make that happen we need to configure its files.
For ubuntu
cd /etc/apache2/apache2.conf
paste the following line
WSGIPassAuthorization On

How to configure Elastic IP with django app in aws?

I am building an app using django in EC2-ubuntu and i have associated Elastic ip with my instance.
i have done following steps :
1. first created instance of ubuntu in ec2 free tier.
2. installed python.
3. installed pip.
4. installed django.
5. create a django project using django-admin startproject.
6. run server using these commads python manage.py runserver 0.0.0.0:80
7. created an elastic ip and associated to the instance.
8. configure security inbound settings with http 0.0.0.0:80 address.
9. able to ping my project using any browser.
But the problem is when i am closing my putty session where i supplied runserver command, django project is also stopped. i did not stop it manually.
Please, help me to keep on running after closing putty session as well.
Thanks,
Kripa Sharma

Take a look at this Answer
I highly recommend that you start using Elastic Beanstalk (Python instance) to take care of all these steps for you. Very simple to setup, and no need to worry about any of the steps you listed.
You can use this instruction to see how you can deploy a Django app in less than 5 minutes.

The problem
You are trying to persist the debug server for a remotely deployed application.
You probably need to review the runserver command documentation. Here are the relevant parts:
django-admin runserver [addrport]
Starts a lightweight development Web server on the local machine. By default, the server runs on port 8000 on the IP address 127.0.0.1. You can pass in an IP address and port number explicitly.
...
DO NOT USE THIS SERVER IN A PRODUCTION SETTING. It has not gone through security audits or performance tests. (And that’s how it’s gonna stay. We’re in the business of making Web frameworks, not Web servers, so improving this server to be able to handle a production environment is outside the scope of Django.)
A webserver
Having skimmed the above docs, you may want to look at "How to deploy with WSGI" section, which gives a few recommendations for commonly used Web servers. My favorite, Gunicorn, includes a usage example:
$ pip install gunicorn
$ gunicorn myproject.wsgi
Having decided, and installed a webserver, you'd need to "daemonize" it and expose it to the world.
The former is usually done by creating a service on your OS, for ubuntu it would be either upstart or systemd depending on the version. Gunicorn docs have examples for both.
The latter is usually achieved with an http-server/proxy such as nginx or apache httpd. And again, Gunicorn has an example for us.
You can see why I like it so much ☺️
Epilogue
While technically possible to run the debug server as a service or even in a terminal multiplexer such as GNU screen or tmux, it's not a recommended or stable long term solution.
That said, these are very useful to know about, so read on the above tools and learn to use them, since they would be invaluable to have in your toolset in the future, for example to avoid accidentally terminating a long running command (such as migration), etc.

VSTS Task: Window machine file copy: system error 53

I'm trying to make a release from VSTS to a VM(running on AWS) that is running an IIS. For that I use three tasks.
Windows Machine File Copy
Manage IIS App
Deploy IIS App
Before the release I'm running a build pipeline that that gives me an artifact containing the web app (webapp.zip).
When I manually put it on the server I can run step 2 and 3 of my release and the application works. The problem I have is that I don't get the Windows Machine File Copy to work. It always throws an exception giving a 'System Error 53: The network path was not found'. Of course the machines are not domain joined, because I'm running my release on VSTS and need the files on a AWS VM. I tried to open port 445 (for file sharing) and made sure the user has rights for the destination path on the target machine.
So my question is: How can I actually move the files from VSTS to the AWS VM if the two machines are not joined.

Using FTP Upload or cURL upload step/task instead.
Regarding how to create FTP site, you can refer to this article: Creating a New FTP Site in IIS 7.

Disclaimer: this answer merely explains how to fulfill the requirements to use tasks of Windows Machine File Copy and Manage/Deploy IIS tasks.
Please always be concerned about security of your target hosts, its hardening and security assessment is absolutely necessary.
As noted in comments, you need to protect the channel of deployment from the outside world, here an high level example:
Answer:
in order to use the Windows Machine File Copy task you need to:
on the target machine (the one running IIS) enable File and Printer Sharing running the following command from administrative command prompt:
netsh advfirewall firewall set rule group="File and Printer Sharing" new enable=yes
assure that on the target machine PowerShell 4 or more recent is installed; the following executed from a PS command prompt prints the version installed on the local machine:
PS> $PSVersionTable.PSVersion
To get PowerShell 5 you could for example install WMF 5
;
on the target machine you must have installed .NET Framework 4.5 or more recent;
For the other two tasks (Manage/Deploy IIS Task), both require you to enable a WinRM HTTPS listener on the target machine. For development deployment scenario you could follow these steps:
download the ConfigureWinRM.ps1 PowerShell script at from the officaial VSTS Tasks GitHub repository;
enable from an Administrative PowerShel command prompt the RemoteSigned PowerShell execution policy:
PS> Set-ExecutionPolicy RemoteSigned
run the script with the following arguments:
PS> ConfigureWinRM.ps1 FQDN https
Note that FQDN is the complete domain name of your machine as it is reached by the VSTS task, e.g. myhostname.domain.example .
Note also that this script downloads two executables (makecert.exe and winrmconf.cmd) from Internet, so this machine must have Internet connection. Otherwise just download those two files, place them sibling to the script, comment out from the script the Download-Files invocation.
Now you have enabled a WinRM HTTPS listener with a self signed certificate. Remember to use the "Test Certificate" option (which ironically means to not test the certificate, better name would have been "Skip CA Check") for those two tasks.
In production deployment scenario you may want to use instead a certificate which is properly signed.

Windows File Copy is designed to work on the same network and enabling it on the internet would open your server for hacking. It's designed for internal networks. FTP would also result in a significant security risk unless managed properly.
The easiest way to move forward would be to run an Agent on the VM in AWS that you want to release to. The agent will then download the artifacts to the AWS VM and run whatever tasks you need to install.
This allows you to run tasks on the local machine without opening it up to security risks.
If you had multiple machines that you need to manage in AWS you can easily create a local network that will allow your single agent to use Windows File Copy to push files to multiple VM's without risk.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js