I'm currently looking to implement Web sockets for a couple of Web services. But was wondering how this stateful http connections will impact Blue-green deployments & auto-scaling on AWS.
Was googling around but haven't came across anything. Would appreciate any advice / inputs.
Use connection draining (sending all new requests to your desired environment - green for example) and giving time for blue clients to fall off.
You can set the max-lifetime of your websocket (connection draining period should be longer than max if that kind of reliability is needed)
Otherwise would just handle client side. If websocket drops initiate new connection through your AWS ELB to a healthy server. Do not keep any state on your ephemeral ELB backends. This also would also work when scaling down on AWS.
Related
I came across the so called Active-Active or Active-Passive routing. Diagrammed as below.
For the later Active-Passive:
It is easy to understand: Passive (HTTP Server 2) is the Standby service/instance for Active (HTTP Server 1) to fail over.
For the first one Active-Active:
I don't understand what is the major benefit though, it seems to me both service/instance must be up and running the same level and the routing is maybe just something like round robin, wouldn't this be kind of resource/cost wasting? Does it introduce extra computing power? what is the use case for it?
In active-passive mode one web server is sitting there costing you money but not serving any requests. If a sudden surge in traffic came in the extra web server would not be able to help absorb the extra load. The only time the second web server starts being used is when the first web server crashes and can no longer serve requests. This gives you failover in the event of a server crash, but does not help you at all in the event of a sudden surge in traffic.
In active-active mode each web server is serving some of the traffic. In order to scale out your web servers (horizontal scaling) you would have two or more servers, all in "active" mode serving some portion of the web requests. If a sudden surge in traffic comes in, that surge is spread across multiple servers which can hopefully absorb the load, and new servers can be added automatically by AWS as needed, and removed when no longer needed.
i am traying to figure it out how is the best way to deploy a TCP/IP and UDP service on Amazon AWS.
I made a previous research to my question and i can not find anything. I found others protocols like HTTP, MQTT but no TCP or UDP
I need to refactor a GPS Tracking service running right now in AMAZON EC2. The GPS devices sent the position data using udp and tcp protocol. Every time a message is received the server have to respond with an ACKNOWLEDGE message, giving the reception confirmation to the gps device.
The problem i am facing right now and is the motivation to refactor is:
When the traffic increase, the server is not able to catch up all the messages.
I try to solve this issue with load balancer and autoscaling but UDP is not supported.
I was wondering if there is something like Api Gateway, which gave me a tcp or udp endpoint, leave the message on a SQS queue and process with a lambda function.
Thanks in advance!
Your question really doesn't make a lot of sense - you are asking how to run a service without running a server.
If you have reached the limits of a single instance, and you need to grow, look at using the AWS Network Load Balancer with an autoscaled group of EC2 instances. However, this will not support UDP - if you really need that, then you may have to look at 3rd party support in the AWS Marketplace.
Edit: Serverless architectures are designed for http based application, where you send a request and get a response. Since your app is TCP based, and uses persistent connections, most existing serverless implementations simply won't support it. You will need to rewrite your app to support http, or use traditional server based infrastructures that can support persistent connections.
Edit #2: As of Dec. 2018, API gateway supports WebSockets. This probably doesn't help with the original question, but opens up other alternatives if you need to run lambda code behind a long running connection.
If you want to go more Serverless, I think the ECS Container Service has instances that accept TCP and UDP. Also take a look at running Docker Containers with with Kubernetes. I am not sure if they support those protocols, but I believe they do.
If not, some EC2 instances with load balancing can be your best bet.
I run a rabbit HA cluster with 3 nodes and a classic AWS load-balancer(LB) in front of them. There are two apps, one that publishes and the other one that consumes through the LB.
When publisher app starts sending 3 million messages, after short period of time its connection is put into Flow Control state. After the publishing is finished, in publisher app logs I can see that all 3 million messages are sent. On the other hand in consumer app log I can only see 500K - 1M messages (varies between runs), which means that the large number of messages is lost.
So what is happening is that in the middle of a run, classic LB decides to change its IP address or drop connections, thus loosing a lot of messages (see my update for more details).
The issue does not occur if I skip LB and hit the nodes directly, doing load-balancing on app side. Of course in this case I lose all the benefits of ELB.
My question are:
Why is LB changing IP addresses and dropping connections, is that related to high message rate from publisher or Flow Control state?
How to configure LB, so that this issue doesn't occur?
UPDATE:
This is my understanding what is happening:
I use AMQP 0-9-1 and publish without 'publish confirms', so message is considered sent as soon as it's put on a wire. Also, the connection on rabbitmq node is between LB and a node, not Publisher app and a node.
Before the communication enters Flow Control, messages are passed from LB to a node immediately
Then the connection between LB and a node enters Flow Control, Publisher App connection is not blocked and thus it continues to publish at the same rate. That causes messages to pile up on LB.
Then LB decides to change IP(s) or drop the connection for whatever reasons and create a new one, causing all the piled messages to be lost. This is clearly visible from the RabbitMQ logs:
=WARNING REPORT==== 6-Jan-2018::10:35:50 ===
closing AMQP connection <0.30342.375> (10.1.1.250:29564 -> 10.1.1.223:5672):
client unexpectedly closed TCP connection
=INFO REPORT==== 6-Jan-2018::10:35:51 ===
accepting AMQP connection <0.29123.375> (10.1.1.22:1886 -> 10.1.1.223:5672)
The solution is to use AWS network LB. The network LB is going to create a connection between Publisher App and rabbitmq node. So if the connection is blocked or dropped Publisher is going to be aware of that and act accordingly. I have run the same test with 3M messages and not the single message is lost.
In the AWS docs, there's this line which explains the behaviour:
Preserve source IP address Network Load Balancer preserves the client side source IP allowing the back-end to see the IP address of
the client. This can then be used by applications for further
processing.
From: https://aws.amazon.com/elasticloadbalancing/details/
ELBs will change their addresses when they scale in reaction to traffic. New nodes come up, and appear in DNS, and then old nodes may go away eventually, or they may stay online.
It increases capacity by utilizing either larger resources (resources with higher performance characteristics) or more individual resources. The Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer when it scales so that the new resources have their respective IP addresses registered in DNS. The DNS record that is created includes a Time-to-Live (TTL) setting of 60 seconds, with the expectation that clients will re-lookup the DNS at least every 60 seconds. (emphasis added)
— from “Best Practices in Evaluating Elastic Load Balancing”
You may find more useful information in that "best practices" guide, including the concept of pre-warming a balancer with the help of AWS support, and how to ramp up your test traffic in a way that the balancer's scaling can keep up.
The behavior of a classic ELB is automatic, and not configurable by the user.
But it also sounds as if you have configuration issues with your queue, because it seems like it should be more resilient to dropped connections.
Note also that an AWS Network Load Balancer does not change its IP addresses and does not need to scale by replacing resources the way ELB does, because unlike ELB, it doesn't appear to run on hidden instances -- it's part of the network infrastructure, or at least appears that way. This might be a viable alternative.
We're using Redis to collect events from our web application (pub/sub based) behind AWS ELB.
We're looking for a solution that will allow us to scale-up and high-availability for the different servers. We do not wish to have these two servers in a Redis cluster, our plan is to monitor them using cloudwatch and switch between them if necessary.
We tried a simple test of locating two Redis server behind the ELB, telnetting the ELB DNS and see what happens using 'redis-cli monitor', but we don't see nothing. (when trying the same without the ELB it seems fine)
any suggestions?
thanks
I came across this while looking for a similar question, but disagree with the accepted answer. Even though this is pretty old, hopefully it will help someone in the future.
It's more appropriate for your question here to use DNS failover with a Redis Replication Auto-Failover configuration. DNS failover provides groups of availability (if you need that level of scale) and the Replication group provides cache up time.
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-configuring.html
The Active-passive failover should provide the solution you're wanting with High Availability:
Active-passive failover: Use this failover configuration when you want
a primary group of resources to be available the majority of the time
and you want a secondary group of resources to be on standby in case
all of the primary resources become unavailable. When responding to
queries, Amazon Route 53 includes only the healthy primary resources.
If all of the primary resources are unhealthy, Amazon Route 53 begins
to include only the healthy secondary resources in response to DNS
queries.
After you setup the DNS, then you would point that to the Elasticache Redis failover group's URL and add multiple groups for higher availability during a failover operation.
However, you might need to setup your application to write and read from different endpoints to maximize the architecture's scalability.
Sources:
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Replication.html
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/AutoFailover.html
Placing a pair of independent redis nodes behind a LB will likely not be what you want. What will happen is ELB will try to balance connections to each instance, splitting half to one and half to another. This means that commands issued by one connection may not be seen by another. It also means no data is shared. So client a could publish a message, and client b being subscribed to the other server won't see the message.
For PUBSUB behind ELB you have a secondary problem. ELB will close an idle connection. So if you subscribe to a channel that isn't busy your ELB will close your connection. As I recall the max you can make this is 60s, meaning if you don't publish a message every single minute your clients will be disconnected.
As to how much of a problem that is depends on your client library, and frankly in my experience most don't handle it well in that they are unaware of the need to re-subscribe upon re-establishing the connection, meaning you would have to code that yourself.
That said a sentinel + redis solution would be quite ideal if your c,isn't has proper sentinel support. In this scenario. Your client asks the sentinels for the master to talk to, and on a connection failure it repeats this process. This would handle the setup you describe, without the problems of being behind an ELB.
Assuming you are running in VPC:
did you register the EC2 instances with the ELB?
did you add the correct security group setting to the ELB (allowing inbound port 23)?
did you add an ELB listener that maps port 23 on the ELB to port 23 on the instances?
did you set sensible ELB health checks (e.g. TCP on port 23) so that ELB thinks the EC2 instances are healthy?
If the ELB thinks the servers behind it are not healthy then ELB will not send them any traffic.
Does AWS support websockets with SSL ?
Can EWS ELB be used for websockets over SSL ?
What happens when a EC2 instance(machine) is added or removed to this ELB. Especially removed; what if a machine goes down. are the existing sockets routed to some other machine or reseted to connected.
can ELB be a bottleneck at any point in time.
any other alternatives .. let me know
This link might prove partially helpful for you - it would appear that you can do web sockets over SSL, but currently I'm struggling to implement it.
StackOverflow - Websocket with Tomcat 7 on AWS Elastic Beanstalk
Currently AWS ELB doesn't support Websocket balancing, there is a trick to do it via SSL, but it has some limitation and depends on your app logic. So if websocket connection is used only as server-client communication, it will work. But if you have more advanced logic when clients must communicate with each other via a server then this solution won't work. For example one client has established connection for a chatroom, then other clients can connect to the established chatroom and communicate with each other.
Then only possible way to use HA-proxy http://blog.haproxy.com/2012/11/07/websockets-load-balancing-with-haproxy/
But shown example just shows how to configure HA-proxy base on two servers. So if you do not use Amazon Autoscalling Group, the solution is good. But if you will need use ASG, the question about add/remove instances to ha-proxy config is other challenge.