What is the cause of websocket latency spikes?

What is the cause of websocket latency spikes? - amazon-web-services

I am running a server (that uses tornado python) on a single AWS instance and I am running into spikes in websocket latency.
Profiling the round trip time from when a websocket message is sent to the client, which then immediately sends an ack message back to the server, to when the server receives the ack message yields an average of <.1 second, however I note sometimes it goes up to 3 seconds. Note: there are no spikes when running the server locally.
What could be the cause or fix for this? I looked at the CPU usage and it only goes up to 40% max. The spikes are not correlated with heavy traffic (2 or 3 clients usually) and the client's internet seems fine. I find it hard to believe the instance is going beyond capacity with such low usage.

The fact that the spike is 3 seconds is actually telling you a lot more than you may suspect, about the nature of the problem.
It's packet loss.
TCP, as you likely know, is said to provide "reliable" transport, guaranteeing that payload sent is received by the far end in the order in which it was sent, because TCP reassembles things in the correct order before delivering the payload. One significant way in which this is accomplished is by the automatic retransmission of packets that are considered to have been lost.
You'll never guess the default initial timer value for retransmissions of lost packets. Or, perhaps, now, you will.
It's 3 seconds in many, if not most, implementations, based on standards established several years ago in a time when the bandwidth and latency of today's transmission links were unheard of, perhaps unimagined.
You won't see evidence of the retransmission at at the websocket server or the client software, because TCP shields the higher layers from knowing that it occurs... but 3 seconds is a dead giveaway that this is exactly the problem.
You'll see the retransmissions of the traffic occurring if you observe the network traffic with a packet sniffer, though that will only serve to confirm that this is the issue.
It could be loss from server to client, or loss from client to server. The latter is generally more likely, since clients often have a lower amount of available upstream bandwidth... but the directionality of the packet loss doesn't clearly indicate the physical location where it is occurring. Unless your client keeps track of local time, so that request and response initiation times can be correlated, you don't know whether the delay is in the message, or in the acknowledgement.
Under relatively light load, it seems unlikely that the problem is on your instance or in the AWS network on your side, and you obviously can't connect a sniffer to arbitrary points on the Internet to pinpoint the problem.
Given a case like this, it may be easier -- and surprisingly feasible -- to prove where the problem isn't, rather than where it is.
One technique for this would be to create a deliberate detour for the traffic through different equipment located elsewhere -- such as a different AWS region or another cloud provider.
First, of course, you'll want to learn to spot these retransmissions using wireshark.
Then, configure a proxy server at a different location, using a simple TCP connection proxy -- such as HAProxy, or even a simple tool like redir or socat.
Such a configuration will listen for connections from clients, and when one is established, will create a new TCP connection to the destination (your websocket server) but -- importantly -- they only tie the two connections together at the payload level -- not the TCP level, and of course nothing lower -- so retransmissions will only be seen on the wire at all between this intermediate server and the end of the connection with the packet loss problem. The other end will show no evidence of the retransmissions -- just data arriving later than expected.
For this test to be meaningful, the proxy needs to be located away from the server and the client, and with no meaningful common infrastructure -- hence the suggestion of placing it in a different AWS region. A different availability zone in the same region may share common Internet infrastructure at some level, so that's not far enough away for this purpose.
If client <--> proxy <--> server shows TCP retransmissions on the path between proxy and server, and not between client and proxy, the problem really is likely to be in your server, its hardware, network, or Internet connection, and you'll have to proceed accordingly.
Conversely (and, I would suggest, more likely) if the path between proxy and server is free of retransmissions but the path between client and proxy is still dirty, you have eliminated the server and its infrastructure as the source of the problem. How to proceed is up to you, but at this point you do know what the problem... isn't.
Two other possibilities:
Both sides remain dirty, which is the least likely scenario. Rule 1 of troubleshooting is to assume initially that you only have one problem, not two.
Or, both sides are suddenly and unexectedly clean when traffic uses this setup, which suggests thay your test setup has routed around a broken piece of the Internet. You've "solved" it but have no idea how. We'll also hope this isn't the outcome, but given the vagaries of the global Internet, it's not unthinkable that your stack may include components like this, with geolocation-DNS-based selection of an intermediate endpoint. This seems like a convolution but does have its place.
Such a tactic is actually part of the logic behind the S3 transfer acceleration feature. The content is not any closer to the end user, but the TCP connection from the browser is being terminated on equipment in the AWS edge network, at a location that is often nearer to the browser, and a second TCP connection back to the bucket is established, with the payload connected together... and, yes, it's faster and more stable, with the significance of the change becoming more notable as distance and connection quality vary.

Related

Improve Upload Speed

I've a client who I developed a Rails app for. The app relies on his customers uploading varies of images, files, and pdf size ranges from 1mb to 100mb.
His been telling me that many of his customer are complaining the slowness and unstable upload speed.
I use direct connect to Amazon S3 to handle the upload. I explain to him that it could there are factors that is out of my control in terms of upload speed.
But he insist that there is something we can do to improve upload speed.
I'm running out of ideas and expertise here. Does anyone have a solution?

On the surface, there are two answers -- no, of course there's nothing you can do, the Internet is a best-effort transport, etc., etc.,... and no, there really shouldn't be a problem, because S3 uploads perform quite well.
There is an option worth considering, though.
You can deploy a global network of proxy servers in front of S3 and use geographic DNS to route those customers to their nearest proxy. Then install high-speed, low latency optical circuits from the proxies back to S3, reducing the amount of "unknown" in the path, as well as reducing the round-trip time and packet loss potential between the browser and the chosen proxy node at the edge of your network, improving throughput.
I hope the previous paragraph is amusing on first reading, since it sounds like a preposterously grandiose plan for improving uploads to S3... but of course, I'm referring to CloudFront.
You don't actually have to use it for downloads; you can, if you want, just use it for uploads.
your users can now benefit from accelerated content uploads. After you enable the additional HTTP methods for your application’s distribution, PUT and POST operations will be sent to the origin (e.g. Amazon S3) via the CloudFront edge location, improving efficiency, reducing latency, and allowing the application to benefit from the monitored, persistent connections that CloudFront maintains from the edge locations to the origin servers.
https://aws.amazon.com/blogs/aws/amazon-cloudfront-content-uploads-post-put-other-methods/
To illustrate that the benefit here does have a solid theoretical basis...
Back in the day when we still used telnet, when T1s were fast Internet connections and 33.6kbps was a good modem, I discovered that I had far better responsiveness from home, making a telnet connection to a distant system, if I first made a telnet connection to a server immediately on the other side of the modem link, then make a telnet connection to the distant node from within the server.
A direct telnet connection to the distant system followed exactly the same path, through all the same routers and circuits, and yet, it was so sluggish as to be unusable. Why the stark difference, and what caused the substantial improvement?
The explanation was that making the intermediate connection to the server meant there were two independent TCP connections, with only their payload tied together: me to the server... and the server to the distant system. Both connections were bad in their own way -- high latency on my modem link, and congestion/packet loss on the distant link (which had much lower round-trip times, but was overloaded with traffic). The direct connection meant I had a TCP connection that had to recover from packet loss while dealing with excessive latency. Making the intermediate connection meant that the recovery from the packet loss was not further impaired by the additional latency added by my modem connection, because the packet loss was handled only on the 2nd leg of the connection.
Using CloudFront in front of S3 promises to solve the same sort of problem in reverse -- improving the responsiveness, and therefore the throughput, of a connection of unknown quality by splitting the TCP connection into two independent connections, at the user's nearest CloudFront edge.

Debugging network applications and testing for synchronicity?

If I have a server running on my machine, and several clients running on other networks, what are some concepts of testing for synchronicity between them? How would I know when a client goes out-of-sync?
I'm particularly interested in how network programmers in the field of game design do this (or just any continuous network exchange application), where realtime synchronicity would be a commonly vital aspect of success.
I can see how this may be easily achieved on LAN via side-by-side comparisons on separate machines... but once you branch out the scenario to include clients from foreign networks, I'm just not sure how it can be done without clogging up your messaging system with debug information, and therefore effectively changing the way that synchronicity would result without that debug info being passed over the network.
So what are some ways that people get around this issue?
For example, do they simply induce/simulate latency on the local network before launching to foreign networks, and then hope for the best? I'm hoping there are some more concrete solutions, but this is what I'm doing in the meantime...

When you say synchronized, I believe you are talking about network latency. Meaning, that a client on a local network may get its gaming information sooner than a client on the other side of the country. Correct?
If so, then I'm sure you can look for books or papers that cover this kind of topic, but I can give you at least one way to detect this latency and provide a way to manage it.
To detect latency, your server can use a type of trace route program to determine how long it takes for data to reach each client. A common Linux program example can be found here http://linux.about.com/library/cmd/blcmdl8_traceroute.htm. While the server is handling client data, it can also continuously collect the latency statistics and provide the data to the clients. For example, the server can update each client on its own network latency and what the longest latency is for the group of clients that are playing each other in a game.
The clients can then use the latency differences to determine when they should process the data they receive from the server. For example, a client is told by the server that its network latency is 50 milliseconds and the maximum latency for its group it 300 milliseconds. The client then knows to wait 250 milliseconds before processing game data from the server. That way, each client processes game data from the server at approximately the same time.
There are many other (and probably better) ways to handle this situation, but that should get you started in the right direction.

Optimizing Jetty for heartbeat detection of thousands of machines?

I have a large number of machines (thousands and more) that every X seconds would perform an HTTP request to a Jetty server to notify they are alive. For what value of X should I use persistent HTTP connections (which limits number of monitored machines to number of concurrent connections), and for what value of X the client should re-establish a TCP connection (which in theory would allow to monitor more machines with the same Jetty server).
How would the answer change for HTTPS connections? (Assuming CPU is not a constraint)
This question ignores scaling-out with multiple Jetty web servers on purpose.
Update: Basically the question can be reduced to the smallest recommended value of lowResourcesMaxIdleTime.

I would say that this is less of a jetty scaling issue and more of a network scaling issue, in which case 'it depends' on your network infrastructure. Only you really know how your network is laid out and what sort of latencies are involved in order to come up with a value of X.
From an overhead perspective the persistent HTTP connections will of course have some minor effect (well I say minor but depends on your network) and the HTTPS will again have a larger impact....but only from a volume of traffic perspective since you are assuming CPU is not a constraint.
So from a jetty perspective, it really doesn't need to be involved in the question, you seem to ultimately be asking for help optimizing bytes of traffic on the wire so really you are looking for the best protocol at this point. Since with HTTP you are having to mess with headers for each request you may be well served looking at something like spdy or websocket which will give you persistent connections but are optimized for low round trip network overhead. But...they seem sort of overkill for a heartbeat. :)

How about just make them request at different time? Assume first machine request, then you pick a time to response to that machine as the next time to heart beat of that machine (also keep the id/time at jetty server), the second machine request, you can pick another time to response to second machine.
In this way, you can make each machine perform heart beat request at different time so no concurrent issue.
You can also use a random time for the first heart beat if all machines might start up at the same time.

Bypassing the NAT and open ports through C++ to achieve low latency

My purpose is to:
not ask to the user for opening ports on his router
doing everything by code with my application
it's possible to do that? Considering that this application is supposed to work only with other machiines with the same application installed can i write some kind of protocol from the ground-up that can do that?
My general idea is to make the connection as fast as possible, i also have to exchange small packets, lowering the delay is much more important for me than just having an high throughput.

Do not mess with the NAT. That won't help with latency all that much anyway. You're using TCP/IP, which is a pretty high level protocol, and is relatively slow. That is - the protocol does a lot of great work for you - but at a cost in terms of latency. ( It maintains connection state, and keeps packets in order, and does a decent job of guaranteeing packet delivery etc. )
If you want a very low latency network channel use UDP - this is lower level, and does not do nearly as much work as TCP. UDP simply does its best to deliver each packet to the destination without maintaining connection open, packets don't necessarily arrive in order, and there is know way to know if the packet even got to its destination.
You need to build those things yourself - or learn to live without them.
Applications built on UDP tend to repeat a lot of information, and implement the protocol logic with a lot of room for error. The result is generally lower latency - but at a cost generally of either reliability or transfer rate.
Also - if you need low latency do NOT tunnel through another protocol like tunneling through SSH or something. This will just increase latency.

This answer is geared toward your requirement that you not ask the customer to adjust any settings on their network. If you are okay with having the customer change things on their network for you, then you should just implement your own custom application as Rafael suggests.
You could use the HTTP protocol over port 80 or HTTPS (which is HTTP over SSL/TLS) protocol over port 443 if you need it encrypted. Or use SSH over port 22.
These are just ports that are typically opened up by a firewall, so you are likely to get through. There is no guarantee that they will be open.
The reason to use the HTTP protocol when using the HTTP or HTTPS is that the firewall may be doing deep packet inspection, and it may blackhole or reset your connection if it doesn't match its expectations. Same goes for using SSH over port 22. You might be better off using SSL over 443, and just hope the firewall is not decrypting the data to do HTTP inspection (just remember that it might be decrypting).
For implementation, you could use libCURL, or Boost.Asio.

TCP/IP and designing networking application

i'm reading about way to implemnt client-server in the most efficient manner, and i bumped into that link :
http://msdn.microsoft.com/en-us/library/ms740550(VS.85).aspx
saying :
"Concurrent connections should not exceed two, except in special purpose applications. Exceeding two concurrent connections results in wasted resources. A good rule is to have up to four short lived connections, or two persistent connections per destination "
i can't quite get what they mean by 2... and what do they mean by persistent?
let's say i have a server who listens to many clients , whom suppose to do some work with the server, how can i keep just 2 connections open ?
what's the best way to implement it anyway ? i read a little about completion port , but couldn't find a good examples of code, or at least a decent explanation.
thanks

Did you read the last sentence:
A good rule is to have up to four
short lived connections, or two
persistent connections per
destination.
Hard to say from the article, but by destination I think they mean client. This isn't a very good article.

A persistent connection is where a client connects to the server and then performs all its actions without ever dropping the connection. Even if the client has periods of time when it does not need the server, it maintains its connection to the server ready for when it might need it again.
A short lived connection would be one where the client connects, performs its action and then disconnects. If it needs more help from the server it would re-connect to the server and perform another single action.
As the server implementing the listening end of the connection, you can set options in the listening TCP/IP socket to limit the number of connections that will be held at the socket level and decide how many of those connections you wish to accept - this would allow you to accept 2 persistent connections or 4 short lived connections as required.

What they mean by, "persistent," is a connection that is opened, and then held open. It's pretty common problem to determine whether it's more expensive to tie up resources with an "always on" connection, or suffer the overhead of opening and closing a connection every time you need it.
It may be worth taking a step back, though.
If you have a server that has to listen for requests from a bunch of clients, you may have a perfect use case for a message-based architecture. If you use tightly-coupled connections like those made with TCP/IP, your clients and servers are going to have to know a lot about each other, and you're going to have to write a lot of low-level connection code.
Under a message-based architecture, your clients could place messages on a queue. The server could then monitor that queue. It could take messages off the queue, perform work, and place the responses back on the queue, where the clients could pick them up.
With such a design, the clients and servers wouldn't have to know anything about each other. As long as they could place properly-formed messages on the queue, and connect to the queue, they could be implemented in totally different languages, and run on different OS's.
Messaging-oriented-middleware like Apache ActiveMQ and Weblogic offer API's you could use from C++ to manage and use queues, and other messaging objects. ActiveMQ is open source, and Weblogic is sold by Oracle (who bought BEA). There are many other great messaging servers out there, so use these as examples, to get you started, if messaging sounds like it's worth exploring.

I think key words are "per destination". Single tcp connection tries to accelerate up to available bandwidth. So if you allow more connections to same destination, they have to share same bandwidth.
This means that each transfer will be slower than it could be and server has to allocate more resources for longer time - data structures for each connection.
Because establishing tcp connection is "time consuming", it makes sense to allow establish second connection in time when you are serving first one, so they are overlapping each other. for short connections setup time could be same as for serving the connection itself (see poor performance example), so more connections are needed for filling all bandwidth effectively.
(sorry I cannot post hyperlinks yet)
here msdn.microsoft.com/en-us/library/ms738559%28VS.85%29.aspx you can see, what is poor performance.
here msdn.microsoft.com/en-us/magazine/cc300760.aspx is some example of threaded server what performs reasonably well.
you can limit number of open connections by limiting number of accept() calls. you can limit number of connections from same source just by canceling connection when you find out, that you allready have more then two connections from this location (just count them).
For example SMTP works in similar way. When there are too many connections, it returns 4xx code and closes your connection.
Also see this question:
What is the best epoll/kqueue/select equvalient on Windows?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js