AWS loadbalancer per packet - only iptables? - amazon-web-services

Need a design advice. Need to run in AWS loadbalancer per packet (not per flow).
It's for unidirectional UDP based streaming.
That means that each packet received by the loadbalancer should be send to another target - so that all targets receive the same amount of packets.
I do not see any ready solution and considering using EC2 with iptables and "-m statistic --mode random" PREROUTING chain. Any comments on the performance of that module at 1 up to 10Gbit/s scale ? (how strong EC2 instance would i need?)
Any other advices / hints how to achieve it ?
Thanks,

AWS Network Loadbalancer can be configured to send to "random" Targets in the TargetGroup, but this behaviour is not documented and just stated (to be exact, it's not defined how this distribution is done). It's the general ELB behaviour that targets are chosen by some hidden algorithm. Maybe it's worth an experiment? Make sure that Stickiness is turned off, as this is enabled the exact opposite of your use case.
I couldn't find a hard definition of how many GBit/s a NLB will support. But there is the concept of LCU (Load Balancer Capacity Units) that determines also billing and needs to be taken into account. LCUs are exposed in CloudWatch
Custom EC2 Instances will work and also cost a lot as CPU scales (roughly) with network. Here is a general list of EC2 Instances that you can filter for your network requirements and also see the pricing for it.
Maybe you should generally go for Devices with Enhanced Networking and Nitro, as later have special hardware for fast networking.

Related

Outgoing network performance (AWS)

There an external HTTP server (located somewhere in the US), which we must communicate with. We use AWS EC2 instances.
While we can buy a "bigger instance" to improve the internal network performance, is there a way to lessen (optimize?) the roundtrip time between our EC2 instance and the external server? Are therer any tools that could be useful?
You haven't specified what type of EC2 instance you use which is a big factor determining the network performance.
You also said
from my home network, it is much faster than when running on an AWS EC2 (regardless of where the ec2 is hosted)
I know nothing about your home network and your EC2 instance config so this is hard to judge but I'd expect, on average, the EC2 instance having faster network than what's available on the end user's site.
It's also not 100% clear what you are measuring. You said "round trip time" so you are only interested in end-to-end latency? Any particular throughput requirements?
That said, here's a useful cheat sheet which you can download and check your instance type: https://cloudonaut.io/ec2-network-performance-cheat-sheet/
Furthermore, you can use iperf (or iperf3) to perform some experiments on both sides of the connection:
https://www.tecmint.com/test-network-throughput-in-linux/
https://aws.amazon.com/premiumsupport/knowledge-center/network-throughput-benchmark-linux-ec2/

Ping between aws and gcp

I have created a Site to Site VPN connection between VPC of Google cloud Platform and AWS in North Virginia region for both the VPCs. But the problem is I have been getting a very high ping and low bandwidth while communicating between the instances. Can any one tell me the reason for this?
image showing the ping data
The ping is very high considering they are in a very close region. Please help.
Multiple reason behind the cause :
1) verify gcp network performance by gcping
2) verify the tcp size and rtt for bandwidth
3) verify with iperf or tcpdump for throughput
https://cloud.google.com/community/tutorials/network-throughput
Be aware that any VPN will be traversing the internet, so even though they are relatively close to each other there will be multiple hops before the instances are connected together.
Remember that from the instance it will need to route outside of AWS networks, then to any hops on the internet to GCP and finally routed to the instance and back again to return the response
In addition there is some variation in performance as the line will not be dedicated.
If you want dedicated performance, without traversing the internet you would need to look at AWS Direct Connect. However, this might limit your project because of cost.
One of the many limits to TCP throughout is:
Throughput <= EffectiveWindowSize / RoundTripTime
If your goal is indeed higher throughput, then you can consider tweaking the TCP window size limits. The default TCP window size under Linux is ~3MB. However, there is more to EffectiveWindowSize than that. There is also the congestion window, which will depend on factors such as packet losses and congestion control heuristics being used (eg cubic vs bbr).
As far as sanity checking the ping RTTs you are seeing, you can compare with ping times you see between an instance in AWS us-east-1 and GCP us-east4 when you are not using a VPN.

Kubernetes - Cross AZ traffic

Does anyone have any advice on how to minimize cross-az traffic for inter-pod communication when running Kubernetes in AWS? I want to keep my microservices pinned to the same availability zone so that microservice-a that resides in az-a will transmit it's payload to microservice-b also in az-a.
I know you can pin pods to a label and keep the traffic in the same AZ, but in addition to minimizing the cross az-traffic I also want to maintain HA by deploying to multiple AZs.
In case you're willing to use alpha features you could use inter-pod affinity or node affinity rules to implement such a behaviour without loosing high availability.
You'll find details in the official documentation
Without that you could just have one deployment pinned to one node and a second deployment pinned to another node and one service which selects pods from both deployments - example code can be found here

Using Redis behing AWS load balancer

We're using Redis to collect events from our web application (pub/sub based) behind AWS ELB.
We're looking for a solution that will allow us to scale-up and high-availability for the different servers. We do not wish to have these two servers in a Redis cluster, our plan is to monitor them using cloudwatch and switch between them if necessary.
We tried a simple test of locating two Redis server behind the ELB, telnetting the ELB DNS and see what happens using 'redis-cli monitor', but we don't see nothing. (when trying the same without the ELB it seems fine)
any suggestions?
thanks
I came across this while looking for a similar question, but disagree with the accepted answer. Even though this is pretty old, hopefully it will help someone in the future.
It's more appropriate for your question here to use DNS failover with a Redis Replication Auto-Failover configuration. DNS failover provides groups of availability (if you need that level of scale) and the Replication group provides cache up time.
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-configuring.html
The Active-passive failover should provide the solution you're wanting with High Availability:
Active-passive failover: Use this failover configuration when you want
a primary group of resources to be available the majority of the time
and you want a secondary group of resources to be on standby in case
all of the primary resources become unavailable. When responding to
queries, Amazon Route 53 includes only the healthy primary resources.
If all of the primary resources are unhealthy, Amazon Route 53 begins
to include only the healthy secondary resources in response to DNS
queries.
After you setup the DNS, then you would point that to the Elasticache Redis failover group's URL and add multiple groups for higher availability during a failover operation.
However, you might need to setup your application to write and read from different endpoints to maximize the architecture's scalability.
Sources:
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/AutoFailover.html
Placing a pair of independent redis nodes behind a LB will likely not be what you want. What will happen is ELB will try to balance connections to each instance, splitting half to one and half to another. This means that commands issued by one connection may not be seen by another. It also means no data is shared. So client a could publish a message, and client b being subscribed to the other server won't see the message.
For PUBSUB behind ELB you have a secondary problem. ELB will close an idle connection. So if you subscribe to a channel that isn't busy your ELB will close your connection. As I recall the max you can make this is 60s, meaning if you don't publish a message every single minute your clients will be disconnected.
As to how much of a problem that is depends on your client library, and frankly in my experience most don't handle it well in that they are unaware of the need to re-subscribe upon re-establishing the connection, meaning you would have to code that yourself.
That said a sentinel + redis solution would be quite ideal if your c,isn't has proper sentinel support. In this scenario. Your client asks the sentinels for the master to talk to, and on a connection failure it repeats this process. This would handle the setup you describe, without the problems of being behind an ELB.
Assuming you are running in VPC:
did you register the EC2 instances with the ELB?
did you add the correct security group setting to the ELB (allowing inbound port 23)?
did you add an ELB listener that maps port 23 on the ELB to port 23 on the instances?
did you set sensible ELB health checks (e.g. TCP on port 23) so that ELB thinks the EC2 instances are healthy?
If the ELB thinks the servers behind it are not healthy then ELB will not send them any traffic.

Websocket Load Balancing on AWS EC2

We are building a scaled application that uses WebSockets on AWS EC2. We were considering using the default ELB (Elastic Load Balancing) for this, but that, unnecessarily, makes the load balancer itself a bottleneck for traffic-heavy operations (see this related thread), so we are currently looking into a way to send the client the connection details of a "good instance" to connect to instead. However, the Elastic Load Balancer API does not seem to support a query of the sort "give me (public) connection details of a good instance", which is odd because that is the core functionality of any load balancer. Maybe I have just not looked at the right place?
UPDATE:
Currently, we are investigating two simple solutions using default implementations:
Use ELB in TCP mode which tunnels all traffic through the ELB.
Simply connect to the public IP of the instance that the ELB connected you to for your GET request. The second solution requires public IPs to be enabled, but does not route all traffic through the ELB.
I was concerned about that very last part because I assumed that the ELB is not in the same building as the instance it gave you. But I assume, it usually is in the same building or has some other high-speed connection to the instances? In that case, the tunneling overhead is negligible.
Both solutions seem to be equally viable, or am I overseeing something?
If your application manages to make the ELB a bottleneck, then you are a pretty big fish. Why don't you try first using their load balancer trusting that they do their job right? It is difficult to make it "better", and the most difficult part about this is to define what is "better" in the first place. You definitely did not very well define that in your question, so I am pretty sure that you are well off using just their load balancer.
In some cases it might make sense to develop your own load balancing logic, especially if your machine usage depends on very special metrics not per se accessible to the ELB system.
Yes, I'd say both solutions are viable.
The upside of the second is that it allows greater customization of the load balancing logic you may want to implement (providing an improvement over ELBs round robin), dispatching requests to a server of your convenience after an initial HTTP GET request.
The downside may be on the security front. It's not clear whether security, and SSL is part of your requirements, but in case it is, the second solution forces you to handle it at the ec2 instances level, which can be inconvenient and affect each node's performance. Otherwise websocket communications may be left unsecured.