AWS - EC2 and RDS in different regions is very slow

AWS - EC2 and RDS in different regions is very slow - amazon-web-services

I'm currently in Sydney and I do have the following scenario:
1 RDS on N. Virginia.
1 EC2 on Sydney
1 EC2 on N. Virginia
I need this to redundation, and this is the simplified scenario.
When my app on EC2 sydney connection to RDS on N. Virgnia, it takes almost 2.5 seconds to give me the result. We can think: Ok, that's the latency.
BUT, when I send the request to EC2 N. Virginia, I get the result in less then 500ms.
Why there is a slow connection when you access RDS from outside the region?
I mean: I can experience this slow connection when I'm running the application on my computer too. But when the application is in the same region that RDS, works quickier that on my own computer.

Most likely you request to RDS requires multiple roundtrips to complete. I.e. at first your EC2 instance requests something to RDS, then something else based on the first request etc. Without seeing your database code, it's hard to say exactly what might be the cause of that.
You say then when you talk to the remote EC2 instance, instead, you get the response in less than 500 ms. That suggests that setting up a TCP connection and sending a single request with reply is 500 ms. Based on that, my guess is that your database connection requires at least 5x back and forth traffic.
There is no additional penalty with RDS in terms of using it out of region, but most database protocols are not optimized for high latency conditions. You might be much better off setting up a read replica in Sydney.

If you are trying to connect the RDS using public-facing network, then it might be slow. AWS launched cross region VPC peering, please peer all the region's VPC (make sure there will not be any IP conflict) and try to connect using private connections.

Related

AWS EC2 instance fails consistently at 30 seconds on long page load

I am running an ECS instance on EC2 with an application load balancer, a route53 domain, and a RDS db. This is an internal business application that I have restricted IP access to.
I have ran this app for 3 weeks with no issues. However, today the data that the web app ingests is an abnormally large size. This is not a mistake. Due to this though, a webpage is taking approximately 4 minutes to complete which I verified on my local machine it completes. However, running the same operation on AWS fails at precisely 30 seconds every time.
I have connected the app running on my local machine to my production RDS db and am able to download and upload the data with no issue. So there is no issue with the RDS db. In addition, this same functionality has worked previously and only failed today due to the large amount of data.
I spent hours with Amazon support to solve this issue but we couldn't figure it out. I am assuming it is a setting for one the AWS services I am using that has a TTL or timeout set to 30 seconds, but I couldn't find it in any of the services I am using:
route53
RDS
ECS
ECR
EC2
Load Balancer
Target Group

You have a backend instance timeout, likely in the web server config.
Right now your ELB has a timeout of 60 seconds, but your assets are failing at 30.
There are only a couple assets on AWS with hardcoded timeouts like that. I'm thinking (because this is the first time it's happened), you have one of the following:
Size limits in the upstream, or
Time limits on connection keep-alive
Look at your website server software (httpd/nginx). Nginx has something called "upstream.conf" where you can set upstream timeouts. I'm not sure of httpd does as well.
Resources:
https://serverfault.com/questions/414987/nginx-proxy-timeout-while-uploading-big-files

From the NLB documentation, maybe relevant
EC2 instances must respond to a new request within 30 seconds in order to establish a return path.
I don't actually know what a return path is, nor what a 'response' is in this context since NLB has no concept of requests or responses.
- https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout
EDIT: Disregard, this must have to do with UDP NATing. 'Response' here is probably a packet going back from the EC2 instance to the client

Ping between aws and gcp

I have created a Site to Site VPN connection between VPC of Google cloud Platform and AWS in North Virginia region for both the VPCs. But the problem is I have been getting a very high ping and low bandwidth while communicating between the instances. Can any one tell me the reason for this?
image showing the ping data
The ping is very high considering they are in a very close region. Please help.

Multiple reason behind the cause :
1) verify gcp network performance by gcping
2) verify the tcp size and rtt for bandwidth
3) verify with iperf or tcpdump for throughput
https://cloud.google.com/community/tutorials/network-throughput

Be aware that any VPN will be traversing the internet, so even though they are relatively close to each other there will be multiple hops before the instances are connected together.
Remember that from the instance it will need to route outside of AWS networks, then to any hops on the internet to GCP and finally routed to the instance and back again to return the response
In addition there is some variation in performance as the line will not be dedicated.
If you want dedicated performance, without traversing the internet you would need to look at AWS Direct Connect. However, this might limit your project because of cost.

One of the many limits to TCP throughout is:
Throughput <= EffectiveWindowSize / RoundTripTime
If your goal is indeed higher throughput, then you can consider tweaking the TCP window size limits. The default TCP window size under Linux is ~3MB. However, there is more to EffectiveWindowSize than that. There is also the congestion window, which will depend on factors such as packet losses and congestion control heuristics being used (eg cubic vs bbr).
As far as sanity checking the ping RTTs you are seeing, you can compare with ping times you see between an instance in AWS us-east-1 and GCP us-east4 when you are not using a VPN.

Client Connections Count on AWS EFS?

Based on AWS Documentation, Client Connections is:
The number of client connections to a file system. When using a standard client, there is one connection per mounted Amazon EC2 instance.
Since we have around 10 T3 - EC2 instances running, I would think that ClientConnections would return max of 10.
However, on a normal day, there's around 300 connections and the max we've seen is 1,080 connections.
I have trouble understanding what exactly is Client Connection count.
I initially thought 1 EC2 instance = 1 Connection (Since it only
mounts once, but this doesn't seem to be the case)
Then I thought, it might be per read/write operation. But looking at the graph at the right - read actually dips (we don't have much writes on our website)
Any help appreciated! I believe I might be missing some core concepts, so please feel free to add them in

Client Connection Count refers to the number of IP Addresses(EFS clients) connecting to EFS mount target on a specific NFS port number eg: NFS port 2049
Resource: https://aws.amazon.com/premiumsupport/knowledge-center/list-instances-connected-to-efs/

Can AWS Elastic Load Balancer be used to only send traffic to a second server if the first fails

Can an AWS Elastic Load Balancer be setup so it sends all traffic to a main server and if that server fails, only then send traffic to a second server.
Have an existing web app I picked up that was never built to run on multiple servers and the client has become worried about redundancy. They don't want to invest enough to make it run well across multiple servers so I was thinking I could setup a second EC2 server with a MySQL slave and periodically copy files from the primary server to the secondary using rsync. Then have an AWS ELB send traffic to the primary server and only if that fails send it to the second server.

AWS load balancers don't support "backup" nodes that only take traffic when the primary is down.
Beyond that, you are proposing a complicated scenario.
was thinking I could setup a second EC2 server with a MySQL slave
If you do that, you can only fail over once, then you can't fail back, because the master database will then be obsolete. For a configuration like this to work and be useful, your two MySQL servers need to be configured with master/master (circular) replication, so that each is a replica of the other. This is an advanced configuration that requires expertise and caution.
For the MySQL component, an RDS instance with multi-AZ enabled will provide you with hands-off fault tolerance of the database.
Of course, the client may be unwilling to pay for this as well.
A reasonable shortcut for small systems might be EC2 instance recovery which will bring the site back up if the underlying hardware fails. This feature replaces a failed instance with a new instance, reattaches the EBS volumes, and starts it back up. If the system is stable and you have a solid backup strategy for all data, this might be sufficient. Effective redundancy as a retrofit is non-trivial.

Using Redis behing AWS load balancer

We're using Redis to collect events from our web application (pub/sub based) behind AWS ELB.
We're looking for a solution that will allow us to scale-up and high-availability for the different servers. We do not wish to have these two servers in a Redis cluster, our plan is to monitor them using cloudwatch and switch between them if necessary.
We tried a simple test of locating two Redis server behind the ELB, telnetting the ELB DNS and see what happens using 'redis-cli monitor', but we don't see nothing. (when trying the same without the ELB it seems fine)
any suggestions?
thanks

I came across this while looking for a similar question, but disagree with the accepted answer. Even though this is pretty old, hopefully it will help someone in the future.
It's more appropriate for your question here to use DNS failover with a Redis Replication Auto-Failover configuration. DNS failover provides groups of availability (if you need that level of scale) and the Replication group provides cache up time.
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-configuring.html
The Active-passive failover should provide the solution you're wanting with High Availability:
Active-passive failover: Use this failover configuration when you want
a primary group of resources to be available the majority of the time
and you want a secondary group of resources to be on standby in case
all of the primary resources become unavailable. When responding to
queries, Amazon Route 53 includes only the healthy primary resources.
If all of the primary resources are unhealthy, Amazon Route 53 begins
to include only the healthy secondary resources in response to DNS
queries.
After you setup the DNS, then you would point that to the Elasticache Redis failover group's URL and add multiple groups for higher availability during a failover operation.
However, you might need to setup your application to write and read from different endpoints to maximize the architecture's scalability.
Sources:
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/AutoFailover.html

Placing a pair of independent redis nodes behind a LB will likely not be what you want. What will happen is ELB will try to balance connections to each instance, splitting half to one and half to another. This means that commands issued by one connection may not be seen by another. It also means no data is shared. So client a could publish a message, and client b being subscribed to the other server won't see the message.
For PUBSUB behind ELB you have a secondary problem. ELB will close an idle connection. So if you subscribe to a channel that isn't busy your ELB will close your connection. As I recall the max you can make this is 60s, meaning if you don't publish a message every single minute your clients will be disconnected.
As to how much of a problem that is depends on your client library, and frankly in my experience most don't handle it well in that they are unaware of the need to re-subscribe upon re-establishing the connection, meaning you would have to code that yourself.
That said a sentinel + redis solution would be quite ideal if your c,isn't has proper sentinel support. In this scenario. Your client asks the sentinels for the master to talk to, and on a connection failure it repeats this process. This would handle the setup you describe, without the problems of being behind an ELB.

Assuming you are running in VPC:
did you register the EC2 instances with the ELB?
did you add the correct security group setting to the ELB (allowing inbound port 23)?
did you add an ELB listener that maps port 23 on the ELB to port 23 on the instances?
did you set sensible ELB health checks (e.g. TCP on port 23) so that ELB thinks the EC2 instances are healthy?
If the ELB thinks the servers behind it are not healthy then ELB will not send them any traffic.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js