When using Elastic Beanstalk with CNAME swapping for zero downtime deployments, DNS caching (clients not respecting TTL) causes some clients to continue sending traffic to the old environment (for up to several days).
When using Elastic Beanstalk with Route53 Aliases for zero downtime deployments, does DNS caching remain an issue?
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.CNAMESwap.html
it says
you deploy the new version to a separate environment, and then swap CNAMEs of the two environments to redirect traffic to the new version instantly.
and
However, do not terminate your old environment until the DNS changes have been propagated and your old DNS records expire. DNS servers do not necessarily clear old records from their cache based on the time to live (TTL) you set on your DNS records.
isn't it conflict?
I think DNS caching is still an issue.
How can I migrate the DB to a new version while older version clients exist.
I guess I can migrate db only when it works for both two version.
I've found a good article here.
http://fbrnc.net/blog/2016/05/green-blue-deployments-with-aws-lambda-and-cloudformation
but it use Cloud Formation, not Elastic Beanstalk.
Unfortunately it does. The recommended way now is to use rolling updates.
I havent tested this yet but I thought this was why they implemented the "Swap Environment URLs" action, rather than doing it in route53.
Reference
Related
Old database endpoint : old.cy336nc8sq5l.us-east-1.rds.amazonaws.com
New database endpoint : new.cy336nc8sq5l.us-east-1.rds.amazonaws.com
Above endpoints are automatically created by AWS, at the time of creation of RDS instance
I have tried setting up CNAME for old.cy336nc8sq5l.us-east-1.rds.amazonaws.com with value new.cy336nc8sq5l.us-east-1.rds.amazonaws.com but it did not worked.For this I have to create a new Hosted zone in route53 name cy336nc8sq5l.us-east-1.rds.amazonaws.com
However, If a setup a CNAME in other hosted zone for any url like abc.example.com with value new.cy336nc8sq5l.us-east-1.rds.amazonaws.com works like a charm. The old rds url has been used in multiple application I cannot take a risk to completely abandon, the best way is to use some kind of redirection.
In addition to it, any CNAME under the cy336nc8sq5l.us-east-1.rds.amazonaws.com Hosted zone is not working.
How can I fix this? Please also suggest what is the best practice for redirection rds traffic? I knew for the new DB endpoint, I will create a new custom CNAME and will use that going forward rather that just using the default one. All suggestions are welcome :)
You can't add any records for the domain cy336nc8sq5l.us-east-1.rds.amazonaws.com, because you don't control it, Generally you will be able to create any hosted zones like google.com etc but it won't get reflect unless you change the NS record and SOA records from the original DNS provider to point yours, you can't it with aws rds domains. you can confirm it by doing
dig +short -t ns cy336nc8sq5l.us-east-1.rds.amazonaws.com
If above results returns your NS records then you control that domain.
To have this kind of flexibility in future, i would suggest a way create a private zone like mydb.com and have A record like master.mydb.com with value old.cy336nc8sq5l.us-east-1.rds.amazonaws.com and when you want to switch to another endpoint just switch it in route53, after TTL expires the connections will start coming to new endpoint, since you are making a change, its better to start using this way.
Also for your case, after you switch to new endpoint, you can check the connections count in the old DB to know if its being referred somewhere and by running show processlist;, you will be able to know which IP, its being used.
The bottom line is that you are going to have to update all 30 applications to use the new DB endpoint. If you are going to be deleting databases & recreating them like this regularly, then configure your databases to use a name in a zone you control, and create a CNAME to whatever database endpoint is current.
You may be able to create a temporary solution by adding an entry to /etc/hosts (assuming your clients are running linux - I believe this is also possible on Windows, but it has been a long time) that maps the current IP for the new database to the old hostname. But this is probably just as much work as updating the application to use the new database. It will also fail if you are running a multi-AZ database and have a failover event.
Change your DB identifier can help in some way.
Select your cluster -> Modify -> change DB cluster identifier
You will keep your old database with difference endpoint, then change new DB to new endpoint.
But I love /etc/hosts solution as simple and safe.
I am hosting a Django site on Elastic Beanstalk. I haven't yet linked it to a custom domain and used to access it through the Beanstalk environment domain name like this: http://mysite-dev.eu-central-1.elasticbeanstalk.com/
Today I did some stress tests on the site which led it to spin up several new EC2 instances. Shortly afterwards I deployed a new version to the beanstalk environment via my local command line while 3 instances were still running in parallel. The update failed due to timeout. Once the environment had terminated all but one instance I tried the deployment again. This time it worked. But since then I cannot access the site through the EB environment domain name anymore. I alway get a "took too long to respond" error.
I can access it through my ec2 instance's IP address as well as through my load balancer's DNS. The beanstalk environment is healthy and the logs are not showing any errors. The beanstalk environment's domain is also part of my allowed hosts setting in Django. So my first assumption was that there is something wrong in the security group settings.
Since the load balancer is getting through it seems that the issue is with the Beanstalk environment's domain. As I understand the beanstalk domain name points to the load balancer which then redirects to the instances? So could it be that the environment update in combination with new instances spinning up has somehow corrupted the connection? If yes, how do I fix this and if no what else could be the cause?
Being a developer and newbie to cloud hosting my understanding is fairly limited in this respect. My issue seems to be similar to this one Elastic Beanstalk URL root not working - EC2 Elastic IP and Elastic IP Public DNS working
, but hasn't helped me further
Many Thanks!
Update: After one day everything is back to normal. The environment URL works as previously as if the dependencies had recovered overnight.
Obviously a server can experience downtime, but since the site worked fine when accessing the ec2 instance ip and the load balancer dns directly, I am still a bit puzzled about what's going on here.
If anyone has an explanantion for this behaviour, I'd love to hear it.
Otherwise, for those experiencing similar issues after a botched update: Before tearing out your hair in desperation, try just leaving the patient alone overnight and let the AWS ecosystem work its magic.
I have previously seen it done by having one EC2 instance running HAProxy, configured via a json file/lambda function, that in turn controlled the traffic with sticky sessions, into two separate elasticbeanstalk applications. So we have two layers of load balancing.
However, this has a few issues, one being: Testing several releases becomes expensive, requires more and more EB applications.
By canary release, I mean, being able to release to only a percentage of traffic, to figure out any errors that escaped the devs, the review process, and the QA process, without affecting all traffic.
What would be the best way to handle such a setup with AWS resources and not break the bank? :)
I found this Medium article that explain the usage of passive autoscaling group where you deploy the canary version into it and monitor for statistics. Once you are satisfied with the result, you can change the desired count for the canary autoscaling group to 0, and perform rolling upgrade to the active autoscaling group.
Here is the link to the article: https://engineering.klarna.com/simple-canary-releases-in-aws-how-and-why-bf051a47fb3f
The way you would achieve canary testing with elastic beanstalk is by
Create a 2nd beanstalk environment to which you deploy the canary release
Use a Route53 Weighted routing policy to send a percentage of the DNS requests to your canary environment.
If you're happy with the performance of the canary you can then route 100% of the traffic to the canary env, etc.
Something to keep in mind with DNS routing is, that the weighted routing is not an exact science since clients cache DNS based on the TTL you set in Route53. In the extreme scenario where you would have e.g. only one single client calling your beanstalk environment (such as a a single web server) and the TTL is set to 5 minutes, it could happen that the switching between environments only happens every 5 minutes.
Therefore for weighted routing it is recommended to use a fairly low TTL value. Additionally having many clients (e.g. mobile phones) works better in conjunction with DNS routing.
Alternatively it might be possible to create a separate LB in front of the two beanstalk environments that balances requests between the beanstalk environments. However I'm not 100% sure if a LB can sit in front other (beanstalk) LBs. I suspect the answer is not but I haven tried yet.
Modifying the autoscaling group in elastic beanstalk is not possible, since the LB is managed by beanstalk and beanstalk can decide to revert the changes you did manually on the LB. Additionally beanstalk does not allow you to deploy to a subset of instances while keeping the older version on another subset.
Hope this helps.
Traffic splitting is supported natively by Elastic Beanstalk.
Be sure to select a "high availability" config preset when creating your application environment (by clicking on "configure more options"), as this will configure a load balancer for your env:
Then edit the "Rolling updates and deployments" section of your environment and choose "Traffic splitting" as your deployment strategy.
We currently have our production elastic search on aws. Nightly we update the production elastic with new data (base data) and then we run scrips to merge new base with current production.
Know this works alright but production is off while this is happening. So i thought that i can do all on staging elastic search environment on aws and then when its done just somehow switch to production.
So here my flow.
spin up new elastic search instance (staging)
populate data (staging)
run scripts (merging production to staging)
switch somehow
remove/delete/shutdown old production
I looked at aws route 53 and this looks promising. Basically fiddle with dns settings making "productionelastic" point to staging and then shutdown production instance.
Is there anything else i can do, also will route 53 idea work.
You can use Amazon Route 53 Health Checks and DNS Failover to route requests to the healthy Elastisearch service while the other one is undergoing maintenance, using health checks, and DNS failover:
If you have multiple resources that perform the same function, for
example, web servers or email servers, and you want Amazon Route 53 to
route traffic only to the resources that are healthy, you can
configure DNS failover by associating health checks with your resource
record sets. If a health check determines that the underlying resource
is unhealthy, Amazon Route 53 routes traffic away from the associated
resource record set. For more information, see Configuring DNS
Failover.
Using this service you can switch between both instances according to their availability. See Configuring DNS Failover
I used a iis reverse proxy rule.
create es instance
wait for it to be ready and created
run a powershell to update a fake website rewrite rule to point to
new instance
and then i use the fake website in the production code.
I will use the route53 when i have someone to manage it for me.
thanks
How do you put up a maintenance page in AWS when you want to deploy new versions of your application behind an ELB? We want to have the ELB route traffic to the maintenance instance while the new auto-scaled instances are coming up, and only "flip over" to the new instances once they're fully up. We use auto-scaling to bring existing instances down and new instances, which have the new code, up.
The scenario we're trying to avoid is having the ELB serve both traffic to new EC2 instances while also serving up the maintenance page. Since we dont have sticky sessions enabled, we want to prevent the user from being flipped back and forth between the maintenance-mode page and the application deployed in an EC2 instance. We also can't just scale up (say from 2 to 4 instances and then back to 2) to introduce the new instances because the code changes might involve database changes which would be breaking changes for the old code.
I realise this is an old question but after facing the same problem today (December 2018), it looks like there is another way to solve this problem.
Earlier this year, AWS introduced support for redirects and fixed responses to Application Load Balancers. In a nutshell:
Locate your ELB in the console.
View the rules for the appropriate listener.
Add a fixed 503 response rule for your application's host name.
Optionally provide a text/plain or text/html response (i.e. your maintenance page HTML).
Save changes.
Once the rule propagates to the ELB (took ~30 seconds for me), when you try to visit your host in your browser, you'll be shown the 503 maintenance page.
When your deployment completes, simply remove the rule you added.
The simplest way on AWS is to use Route 53, their DNS service.
You can use the feature of Weighted Round Robin.
"You can use WRR to bring servers into production, perform A/B testing,
or balance your traffic across regions or data centers of varying
sizes."
More information in AWS documentations on this feature
EDIT: Route 53 recently added a new feature that allows DNS Failover to S3. Check their documentation for more details: http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover.html
Came up with another solution that's working great for us. Here are the required steps to get a simple 503 http response:
Replicate your EB environment to create another one, call it something like app-environment-maintenance, for instance.
Change the configuration for autoscaling and set the min and max servers both to zero. This won't cost you any EC2 servers and the environment will turn grey and sit in your list.
Finally, you can use the AWS CLI to now swap the environment CNAME to take your main environment into maintenance mode. For instance:
aws elasticbeanstalk swap-environment-cnames \
--profile "$awsProfile" \
--region "$awsRegion" \
--output text \
--source-environment-name app-prod \
--destination-environment-name app-prod-maintenance
This would swap your app-prod environment into maintenance mode. It would cause the ELB to throw a 503 since there aren't any running EC2 instances and then Cloudfront can catch the 503 and return your custom 503 error page, should you wish, as described below.
Bonus configuration for custom error pages using Cloudfront:
We use Cloudfront, as many people will for HTTPS, etc. Cloudfront has error pages. This is a requirement.
Create a new S3 website hosting bucket with your error pages. Consider creating separate files for response codes, 503, etc. See #6 for directory requirements and routes.
Add the S3 bucket to your Cloudfront distribution.
Add a new behavior to your Cloudfront distribution for a route like /error/*.
Setup an error pages in Cloudfront to handle 503 response codes and point it to your S3 bucket route, like /error/503-error.html
Now, when your ELB thorws a 503, your custom error page will be displayed.
And that's it. I know there are quite a few steps to get the custom error pages and I tried a lot of the suggested options out there including Route53, etc. But all of these have issues with how they work with ELBs and Cloudfront, etc.
Note that after you swap the hostnames for the environments, it takes about a minute or so to propagate.
Route53 is not a good solution for this problem. It takes a significant amount of time for DNS entries to expire before the maintenance page shows up (and then it takes that same amount of time before they update after maintenance is complete). I realize that Lambda and CodeDeploy triggers did not exist at the time this question was asked, but I wanted to let others know that Lambda can be used to create a relatively clean solution for this, which I have detailed in a blog post:
http://blog.ajhodges.com/2016/04/aws-lambda-setting-temporary.html
The jist of the solution is to subscribe a Lambda function to CodeDeploy events, which replaces your ASG with a micro instance serving a static page in your load balancer during deployments.
As far as I could see, we were in a situation where the above answers didn't apply or weren't ideal.
We have a Rails application running the Puma with Ruby 2.3 running on 64bit Amazon Linux/2.9.0 that seems to come with a (classic) ELB.
So ALB 503 handling wasn't an option.
We also have a variety hardware clients that I wouldn't trust to always respect DNS TTL, so Route53 is risky.
What did seem to work nicely is a secondary port on the nginx that comes with the platform.
I added this as .ebextensions/maintenance.config
files:
"/etc/nginx/conf.d/maintenance.conf":
content: |
server {
listen 81;
server_name _ localhost;
root /var/app/current/public/maintenance;
}
container_commands:
restart_nginx:
command: service nginx restart
And dropped a copy of https://gist.github.com/pitch-gist/2999707 into public/maintenance/index.html
Now to set maintenance I just switch my ELB listeners to point to port 81 instead of the default 80. No extra instances, s3 buckets or waiting for clients to fresh DNS.
Only takes maybe ~15s or so for beanstalk (probably mostly waiting for cloudformation in the back-end) to apply.
Our deployment process first runs a cloudformation to spun up a ec2 micro instance (Maintenance instance) which copies pre-defined static page from s3 onto the ec2. Cloudformation is supplied with elb's to which micro ec2 instance is attached. Then a script (powershell or cli) is run to remove web instances (ec2) from elb's leaving Maintenance instance.
This way we switch to maintenance instance during deployment process.
In our case, we have two elb's, one for external and the other internal. Our internal elb's will not be updated during this process and is how we have post prod deployment smoke test is done.
Once testing is done, we run another script to attach web instances back to elb's and delete the Maintenance stack.