Downgrade Amazon RDS Multi AZ deployment to Standard Deployment - amazon-web-services

What might happen if I downgrade my Multi AZ deployment to standard deployment? Is there any possibility of i/o freeze or data loss? if yes, what might be the proper way to minimize downtime of data availability.

I have tried downgrading from Multi AZ deployment to a standard deployment.
The entire process took around 2-3 minutes (The transition time should depend upon your database size). The transition was seamless. We did not experience any downtime. Our website was working as expected during this period.
Just to ensure that nothing gets affected, I took a snapshot and a manual data base dump before downgrading.
Hope this helps.

Related

AWS EC2 t3.micro instance sufficiently stable for spring boot services

I am new to AWS and recently set up a free t3.micro instance. My goal is to achieve a stable hosting of an Angular application with 2 spring boot services. I got everything working, but after a while, the spring boot services are not reachable anymore. When i redeploy the service it will run again. The spring boot services are packed as jar and after the deployment the process is started as a java process.
I thought AWS guarantees permanent availability out of the box. Do i need some more setup such as autoscaling to achieve the desired uptime of the services or is the t3.micro instance not suffienciently performant, so that i need to upgrade to a stronger instance to avoid the problem?
It depends :)
I think you did the right thing by starting with a small instance type and avoid over provisioning in the first place. T3 instance types are generally beneficial for 'burst' usage scenarios i.e. your application sporadically needs a compute spike but not a persistent one. T3 instance types usually work with credits based system, where you instance 'earns' credits when it is idle, and that buffer is always available in times of need (but only until consumed entirely). Then you need to wait for some time window again and earn the credits back.
For your current problem, I think first approach can be to get an idea of the current usage by going through the 'Monitoring' tab on the EC2 instance details page. This will help you understand if the needs are more compute related or i/o related and then you can choose an appropriate instance type from :
https://aws.amazon.com/ec2/instance-types
Next step could also be to profile your application and understand the memory, compute utilisation better. AWS does guarantee availability/durability of resources, but how you consume those resources is more of an application thing, which AWS does not guarantee/control
For your ideas around, autoscaling and availability, it again depends on what your needs are in terms of cost, outages in AWS data centres etc. To have a reliable production setup, you could consider them, but not something really important in the first place.

Does AWS DBInstance maintenance keep data intact

We are using CloudFormation Template to create MySQL AWS::RDS::DBInstance.
My question is when there is maintenance in progress while applying OS upgrades or software/security patches, will
Database Instance be unavailable for the time of maintenance
Does it wipe out data from database instance during maintenance?
If answer to first is yes, will using DBCluster help avoid that short downtime, if I use more than one instances?
From the documentation I did not receive any indication that there is any loss of data possibility.
Database Instance be unavailable for the time of maintenance
They may reboot the server to apply the maintenance. I've personally never seen anything more than a reboot, but I suppose it's possible they may have to shut it down for a few minutes.
Does it wipe out data from database instance during maintenance?
Definitely not.
If answer to first is yes, will using DBCluster help avoid that short
downtime, if I use more than one instances?
Yes, a database in cluster mode would fail-over to another node while they were applying patches to one node.
I am actively working of RDS Database system from the last 5 years. Based on my experience, my answer to your questions as follows in BOLD.
Database Instance be unavailable for the time of maintenance
[Yes, Your RDS system will be unavailable during maintenance of database]
Does it wipe out data from database instance during maintenance?
[ Definitely BIG NO ]
If answer to first is yes, will using DBCluster help avoid that short downtime, if I use more than one instances?
[Yes, In cluster Mode or Multi A-Z Deployment, Essentially AWS apply the patches on the backup node or replica first and then failover to this patch instance. Last, there would be some downtime during the g switchover process]

AWS EC2 Immediate Scaling Up?

I have a web service running on several EC2 boxes. Based on the Cloudwatch latency metric, I'd like to scale up additional boxes. But, given that it takes several minutes to spin up an EC2 from an AMI (with startup code to download the latest application JAR and apply OS patches), is there a way to have a "cold" server that could instantly be turned on/off?
Not by using AutoScaling. At least not, instant in the way you describe. You could make it much faster however, by making your own modified AMI image where you place the JAR and the latest OS patches. These AMI's can be generated as part of your build pipeline. In that case, your only real wait time is for the OS and services to start, similar to a "cold" server.
Packer is a tool commonly used for such use cases.
Alternatively, you can mange it yourself, by having servers switched off, and start them by writing some custom Lambda scripts that gets triggered by Cloudwatch alerts. But since stopped servers aren't exactly free either, i would recommend against that for cost reasons.
Before you venture into the journey of auto scaling your infrastructure and spending time/effort. Perhaps you should do a little bit of analysis on the traffic pattern day over day, week over week and month over month and see if it's even necessary? Try answering some of these questions.
What was the highest traffic ever your app handled, How did the servers fare given the traffic? How was the user response time?
When does your traffic ramp up or hit peak? Some apps get traffic during business hours while others in the evening.
What is your current throughput? For example, you can handle 1k requests/min and two EC2 hosts are averaging 20% CPU. if the requests triple to 3k requests/min are you able to see around 60% - 70% avg cpu? this is a good indication that your app usage is fairly predictable can scale linearly by adding more hosts. But if you've never seen traffic burst like that no point over provisioning.
Unless you have a Zynga like application where you can see large number traffic at once perhaps better understanding your traffic pattern and throwing in an additional host as insurance could be helpful. I'm making these assumptions as I don't know the nature of your business.
If you do want to auto scale anyway, one solution would be to containerize your application with Docker or create your own AMI like others have suggested. Still it will take few minutes to boot them up. Next option is the keep hosts on standby but and add those to your load balancers using scripts ( or lambda functions) that watches metrics you define (I'm assuming your app is running behind load balancers).
Good luck.

Performance issues with weave networking on Kubernetes cluster

I create a Kubernetes (v1.6.1) cluster on AWS with one master and two slave nodes, then I spin up mysql instance using helm and deploy a simple Django web-app that queries latest five rows from the database and displays it. For my web service I specify 'type: LoadBalancer' which creates an ELB on AWS.
If I use 'weave' networking and scale my web-app to at least two replicas, then I begin experiencing inconsistent response time - most of the time it is reasonable (like 0.1-0.2 s), but 20-40% requests take significantly longer (3-5 s, sometimes even more than 15 s). However, if I switch to 'flannel' networking, everything works fast, even with 20-30 replicas of the web-app. All machines have enough resources, so that's not the problem.
I tried debugging to find out what's causing the delay, and the best explanation I have is that AWS ELB doesn't work well with 'weave'. Has anyone experienced similar issues? What could be the problem? Please let me know if I should provide some relevant information.
P.S. I'm new to using Kubernetes.

Can I upgrade Elasticache Redis Engine Version without downtime?

I cannot find any information in the AWS documentation that modifying the Redis engine version will or will not cause downtime. It does not explain how the upgrade occurs other than it's performed in the maintenance window.
It is safe to upgrade a production Elasticache Redis instance via the AWS console without loss of data or downtime?
Note: The client library we use is compatible with all versions of Redis so the application should not notice the upgrade.
Changing a cache engine version is a disruptive process which clears
all cache data in the cluster. **
I don't now the requirements of your particular application. But if you can't lose your data and you need to do a major version upgrade, it would be advisable to migrate to a new cluster rather than upgrading the current setup.
** http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/VersionManagement.html
I am not sure if the answers are still relevant given that the question was asked nearly 7 years ago but there's a few things.
Changing the node type or engine version is a Modify action and your data remains intact on your Elasticache clusters. I believe there was a doc that mentioned (if I find it, I will link it here) the process of Elasticache modifications take place.
Essentially, Elasticache launches a new node on the backend with the modifications you've made and copies your data to it. Suppose the modification you make is a change in the engine version from 5.x to 6.x -
Elasticache will launch new Redis nodes on the backend with Engine 6.x.
As the node comes up, Elasticache will read keys from the 5.x node and copy data to 6.x
When the copy is complete, Elasticache will make a change in the DNS records for your cluster's endpoints.
So, there will be some downtime depending on your application's DNS cache TTL config. For example, your application holds the DNS cache for 300 seconds, it can take it 300 seconds to refresh the DNS cache on your application/client-machine and during that time application might show some errors.
From the elasticache side, I do not think they provide any official SLA for this. But this doc[1] mentions it will only take a "few seconds" for this DNS to propogate (depending on engine versions).
Still, you can always take a snapshot of your cluster as a backup. If anything goes south, you can use snapshot to launch a new cluster with the same data.
Also, one more thing - AWS will never upgrade your engine versions by themselves. The maintenance window for your Elasitcache Cluster is for Security patches and small optimizations on the engines. They do not affect the engine versions.
Cheers!
[1] https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/AutoFailover.html
As mentioned by Will above, the AWS answer has changed. and in theory you can do it without downtime. See:
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/VersionManagement.html
The key points are in terms of downtime and impact on existing use:
The Amazon ElastiCache for Redis engine upgrade process is designed
to make a best effort to retain your existing data and requires
successful Redis replication.
...
For single Redis clusters and clusters with Multi-AZ disabled, we
recommend that sufficient memory be made available to Redis as
described in Ensuring That You Have Enough Memory to Create a Redis
Snapshot. In these cases, the primary is unavailable to service
requests during the upgrade process.
...
For Redis clusters with Multi-AZ enabled, we also recommend that you
schedule engine upgrades during periods of low incoming write traffic.
When upgrading to Redis 5.0.5 or above, the primary cluster continues
to be available to service requests during the upgrade process. When
upgrading to Redis 5.0.4 or below, you may notice a brief interruption
of a few seconds associated with the DNS update.
There are no guarantees here so you will have to make your own decision about the risk of losing data if it fails
It depends on your current version:
When upgrading to Redis 5.0.6 or above, the primary cluster continues to be available to service requests during the upgrade process (source).
Starting with Redis engine version 5.0.5, you can upgrade your cluster version with minimal downtime. The cluster is available for reads during the entire upgrade and is available for writes for most of the upgrade duration, except during the failover operation which lasts a few seconds (source); cluster available for reads during engine upgrades, writes are interrupted only for < 1sec with version 5.0.5 (source)
You can also upgrade your ElastiCache clusters with versions earlier than 5.0.5. The process involved is the same but may incur longer failover time during DNS propagation (30s-1m) (source, source).