How to cope with high frequency data? - c++

I have a C++ application which receives stock data and forward to another application via socket (acting as a server).
Actually the WSASend function returns with error code 10055 after small seconds and I found that is the error message
"No buffer space available. An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full".
The problem arises only when I run the application after the market hours as we are receiving the whole day data (approximately 130 MB) in few minutes (I assume this is relatively large)
I am doing so as a robustness test.
I tried to increase the sending buffer SO_SNDBUF using setsockopt function but the same problem still there.
How can I solve this problem? is this related to receiver buffer?
Sending details:
For each complete message I call the send method which uses overlapped sockets
EDIT:
Can someone give general guidelines to handle high frequency data in C++?

The flow-control of TCP will cause the internal send-buffer to fill up if the receiver is not processing their end of the socket fast enough. It would seem from the error message that you are sending data without regard for how quickly the Winsock stack can process it. It would be helpful if you could state exactly how you are sending the data? Are you waiting for all the data to arrive and then sending one big block, or sending piecemeal?
Are you sending via a non-blocking or overlapped socket? In either case, after each send you should probably wait for a notification that the socket is in a state where it can send more data, either because select()/WaitForMultipleObjects() indicates it can (for non-blocking sockets), or the overlapped I/O completes, signalling that the data has been successfully copied to the sockets internal send buffers.
You can overlap sends, i.e. queue up more than one buffer at a time - that's what overlapped I/O is for - but you need to pay careful regard to the memory implications of locking large numbers of pages and potentially exhausting the non-paged pool.

Nick's answer pretty much hits the nail on the head; you're most likely exhausting the 'locked pages limit' by starting too many overlapped sends at once. Ideally you need to buffer your data in your own memory buffers and only have a set number of overlapped sends pending at any one time. I talk about how my IOCP framework allows you to deal with this kind of situation here http://www.lenholgate.com/blog/2008/07/write-completion-flow-control.html and the related TCP receive window flow control issues here http://www.lenholgate.com/blog/2008/06/data-distribution-servers.html and here http://www.serverframework.com/asynchronousevents/2011/06/tcp-flow-control-and-asynchronous-writes.html.
My preferred solution is to allow a configurable number of pending overlapped sends at any one time and once this limit is exceeded to start buffering data and then using the completion of the pending overlapped sends to drive the sending of the buffered data. This allows you to strictly control the amount of non-paged pool and the amount of 'locked pages' used and makes it possible to have lots of connections sending as fast as possible yet still control the resources that they use.

Related

Information overload from a websocket

I am just very thankful that stackoverflow exists, so many questions that would have taken me hours, they are answred here from exprienced people, thanks everyone :).
One question I have, suppose I am connected to a server via a websocket that sends me data every 1 second, and I am processing that data in a function call it on_feed(cons map_t& m).
Suppose each on_feed call takes 2 seconds, what will happend? Is there an internal queue in the OS that will process the input and queue them?
I hope I am clear, if not what happens if a server sends data too quickly that I can't process it as my process takes time. (I don't want to use my own queue :) )
Thanks!!
A websocket is a TCP socket. TCP sockets have an internal buffer of some unspecified size that holds unread data, until it's read from the socket.
When enough unread data accumulates there are low-level IP messages and protocols that get communicated to the sender indicating that the peer cannot accept any more data on the socket (and additional messages when this is no longer the case and the sending can resume).
The sender will also have a buffer that holds some amount of data that was written to the socket but not yet transmitted to the peer, also of some unspecified size; so the sender's socket's buffer will also accumulate some amount of data, and when that buffer is full any ordinary write() to the socket will block until such time that more data to the socket can be written.
At that point what happens depends entirely on the application on the other end of the websocket and it is entirely up to the application to figure out what to do next. It may choose to wait forever, or for some indeterminate period of time until it can write more data to the socket; or it may choose to close the socket immediately, it is entirely up to the websocket server.
Is there an internal queue in the OS that will process the input and queue them?"
Most operating system kernels do not have built-in support for the WebSocket protocol. They generally only provide TCP/IP support. You will probably be using a WebSocket library which will be linked to your program. This library will likely not be part of the operating system (in the narrow sense). In order to support WebSocket messages, this library will probably have its own input queue.
what happens if a server sends data too quickly that I can't process it as my process takes time
If the input queue of the WebSocket library is full, then the WebSocket library will probably stop accepting new data from the server, until more input has been processed and there is room in the queue to accept new input.
Therefore, it should generally not be a problem if a server attempts to send data faster than the client can process.
If the server software is programmed to send new data sets to the client at a certain rate, but the client is unable to process the data sets at this rate, then the client will probably stop accepting new data after some time, due to its input buffer being full. After that, the server's output buffer will start to fill. If the server software is well-designed, then it should be able to handle this situation well, and it will stop generating data once the output buffer is full, until there is room again in the output buffer.
However, if the server software is not well-designed, then, depending on the situtation, it may not be able to cope with this type of problem.
Also, even if the server software is well-designed, it may expect the client to be able to process the WebSocket messages in a timely manner, and the server may decide to abort the connection if the client is taking too long.

When do you need to modify the receive buffer size of sockets?

From time to time I see network related code in legacy source code and elsewhere modifying the receive buffer size for sockets (using setsockopt with the SO_RCVBUF option). On my Windows 10 system the default buffer size for sockets seems to be 64kB. The legacy code I am working on now (written 10+ years ago) sets the receive buffer size to 256kB for each socket.
Some questions related to this:
Is there any reason at all to modify receive buffer sizes when sockets are monitored and read continuously, e.g. using select?
If not, was there some motivation for this 10+ years ago?
Are there any examples, use cases or applications, where modification of receive buffer sizes (or even send buffer sizes) for sockets are needed?
Typically receive-buffer sizes are modified to be larger because the code's author is trying to reduce the likelihood of the condition where the socket's receive-buffer becomes full and therefore the OS has to drop some incoming packets because it has no place to put the data. In a TCP-based application, that condition will cause the stream to temporarily stall until the dropped packets are successfully resent; in a UDP-based application, that condition will cause incoming UDP packets to be silently dropped.
Whether or not doing that is necessary depends on two factors: how quickly data is expected to fill up the socket's receive-buffer, and how quickly the application can drain the socket's receive-buffer via calls to recv(). If the application is reliably able to drain the buffer faster than the data is received, then the default buffer size is fine; OTOH if you see that it is not always able to do so, then a larger receive-buffer-size may help it handle sudden bursts of incoming data more gracefully.
Is there any reason at all to modify receive buffer sizes when sockets
are monitored and read continuously, e.g. using select?
There could be, if the incoming data rate is high (e.g. megabytes per second, or even just occasional bursts of data at that rate), or if the thread is doing something between select()/recv() calls that might keep it busy for a significant period of time -- e.g. if the thread ever needs to write to disk, disk-write calls might take several hundred milliseconds in some cases, potentially allowing the socket's receive buffer to fill during that period.
For very high-bandwidth applications, even a very short pause (e.g. due to the thread being kicked off of the CPU for a few quanta, so that another thread can run for a quantum or two) might be enough to allow the buffer to fill up. It depends a lot on the application's use-case, and of course on the speed of the CPU hardware relative to the network.
As for when to start messing with receive-buffer-sizes: don't do it unless you notice that your application is dropping enough incoming packets that it is noticeably limiting your app's network performance. There's no sense allocating more RAM than you need to.
For TCP, the RECVBUF buffer is the maximum number of unread bytes that the kernel can hold. In TCP the window size reflects the maximum number of unacknowledged bytes the sender can safely send. The sender will receive an ACK which will include a new window which depends on the free space in the RECVBUF.
When RECVBUF is full the sender will stop sending data. Since mechanism means the sender will not be able to send more data than the receiving application can receive.
A small RECVBUF will work well on low latency networks but on high bandwidth high latency networks ACKS may take too long to get to the sender and since the sender has run out of window, the sender will not make use of the full bandwidth.
Increasing the RECVBUF size increases the window which means the sender can send more data while waiting for an ACK, this then will allow the sender to make use of the entire bandwidth. It does mean that things are less responsive.
Shrinking the RECVBUF means the sender is more responsive and aware of the receiver not eating the data and can back off a lot quicker.
The same logic applies for the SENDBUF as well.

How to determine how much of the socket receive buffer is filled?

In my application, I am trying to send and receive datagrams to different servers as quickly as possible. I noticed that increasing the sending rate is greatly lowering the percentage of responses I receive for a given # of sends. I believe one cause could be recvfrom()'s queue is filling faster than I can call the function to read the packets, reaching the end and dropping a a lot of packets. Is there a way I can check how full the receive buffer is for a socket?
This is on Windows, not Linux, and I would prefer a programmatic approach.

How to send and receive data up to SO_SNDTIMEO and SO_RCVTIMEO without corrupting connection?

I am currently planning how to develop a man in the middle network application for TCP server that would transfer data between server and client. It would behave as regular client for server and server for remote client without modifying any data. It will be optionally used to detect and measure how long server or client is not able to receive data that is ready to be received in situation when connection is inactive.
I am planning to use blocking send and recv functions. Before any data transfer I would call a setsockopt function to set SO_SNDTIMEO and SO_RCVTIMEO to about 10 - 20 miliseconds assuming it will force blocking send and recv functions to return early in order to let another active connection data to be routed. Running thread per connection looks too expensive. I would not use async sockets here because I can not find guarantee that they will get complete in a parts of second especially when large data amount is being sent or received. High data delays does not look good. I would use very small buffers here but calling function for each received byte looks overkill.
My next assumption would be that is safe to call send or recv later if it has previously terminated by timeout and data was received less than requested.
But I am confused by contradicting information available at msdn.
send function
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740149%28v=vs.85%29.aspx
If no error occurs, send returns the total number of bytes sent, which
can be less than the number requested to be sent in the len parameter.
SOL_SOCKET Socket Options
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740532%28v=vs.85%29.aspx
SO_SNDTIMEO - The timeout, in milliseconds, for blocking send calls.
The default for this option is zero, which indicates that a send
operation will not time out. If a blocking send call times out, the
connection is in an indeterminate state and should be closed.
Are my assumptions correct that I can use these functions like this? Maybe there is more effective way to do this?
Thanks for answers
While you MIGHT implement something along the ideas you have given in your question, there are preferable alternatives on all major systems.
Namely:
kqueue on FreeBSD and family. And on MAC OSX.
epoll on linux and related types of operating systems.
IO completion ports on Windows.
Using those technologies allows you to process traffic on multiple sockets without timeout logics and polling in an efficient, reactive manner. They all can be considered successors of the ancient select() function in socket API.
As for the quoted documentation for send() in your question, it is not really confusing or contradicting. Useful network protocols implement a mechanism to create "backpressure" for situations where a sender tries to send more data than a receiver (and/or the transport channel) can accomodate for. So, an application can only provide more data to send() if the network stack has buffer space ready for it.
If, for example an application tries to send 3Kb worth of data and the tcp/ip stack has only room for 800 bytes, send() might succeed and return that it used 800 bytes of the 3k offered bytes.
The basic approach to forwarding the data on a connection is: Do not read from the incoming socket until you know you can send that data to the outgoing socket. If you read greedily (and buffer on application layer), you deprive the communication channel of its backpressure mechanism.
So basically, the "send capability" should drive the receive actions.
As for using timeouts for this "middle man", there are 2 major scenarios:
You know the sending behavior of the sender application. I.e. if it has some intent on sending any data within your chosen receive timeout at any time. Some applications only send sporadically and any chosen value for a receive timeout could be wrong. Even if it is supposed to send at a specific time interval, your timeouts will cause trouble once someone debugs the sending application.
You want the "middle man" to work for unknown applications (which must not use some encryption for middle man to have a chance, of course). There, you cannot pick any "adequate" timeout value because you know nothing about the sending behavior of the involved application(s).
As a previous poster has suggested, I strongly urge you to reconsider the design of your server so that it employs an asynchronous I/O strategy. This may very well require that you spend significant time learning about each operating systems' preferred approach. It will be time well-spent.
For anything other than a toy application, using blocking I/O in the manner that you suggest will not perform well. Even with short timeouts, it sounds to me as though you won't be able to service new connections until you have completed the work for the current connection. You may also find (with short timeouts) that you're burning more CPU time spinning waiting for work to do than actually doing work.
A previous poster wisely suggested taking a look at Windows I/O completion ports. Take a look at this article I wrote in 2007 for Dr. Dobbs. It's not perfect, but I try to do a decent job of explaining how you can design a simple server that uses a small thread pool to handle potentially large numbers of connections:
Windows I/O Completion Ports
http://www.drdobbs.com/cpp/multithreaded-asynchronous-io-io-comple/201202921
If you're on Linux/FreeBSD/MacOSX, take a look at libevent:
Libevent
http://libevent.org/
Finally, a good, practical book on writing TCP/IP servers and clients is "Practical TCP/IP Sockets in C" by Michael Donahoe and Kenneth Calvert. You could also check out the W. Richard Stevens texts (which cover the topic completely for UNIX.)
In summary, I think you should take some time to learn more about asynchronous socket I/O and the established, best-of-breed approaches for developing servers.
Feel free to private message me if you have questions down the road.

C++: How to measure real upload rate on non blocking sockets

I'm writing a program on linux C++ using non-blocking sockets with epoll, waiting for EPOLLOUT in order to do send() for some data.
My question is: I've read that on non-blocking mode the data is copied to the kernel's buffer, thus a send() call may return immediately indicating that all the data has been sent, where in reality it was only copied to the kernel's buffer.
How do I know when the data was actually sent and received by the remote peer, for knowing the real transfer rate?
Whether in non-blocking mode or not, send will return as soon as the data is copied into the kernel buffer. The difference between blocking and non-blocking mode is when the buffer is full. In the full buffer case, blocking mode will suspend the current thread until the the write takes place while non-blocking mode will return immediately with EAGAIN or EWOULDBLOCK.
In a TCP connection, the kernel buffer normally is equal to the window size, so as soon as too much data remains unacknowledged, the connection blocks. This means that the sender is aware of how fast the remote end is receiving data.
With UDP it is a bit more complex because there is no acknowledgements. Here only the receiving end is capable of measuring the true speed since sent data may be lost en-route.
In both the TCP and UDP cases, the kernel will not attempt to send data that the link layer is unable to process. The link layer can also flow off the data if the network is congested.
Getting back to your case, when using non-blocking sockets, you can measure the network speed provided you handle the EAGAIN or EWOULDBLOCK errors correctly. This is certainly true for TCP where you send more data than the current window size (probably 64K or so) and you can get an idea of the link layer speed with UDP sockets as well.
You can get the current amount of data in the kernels socket buffers using an IOCTL. This would allow you to check what's actually been sent. I'm not sure it matters that much though, unless you have MASSIVE buffers and a tiny amount of data to send it's probably not of interest.
Investigate the TIOCOUTQ/TIOCINQ ioctl on your socket fd.
My question is: I've read that on non-blocking mode the data is copied to the kernel's buffer
That happens in all modes, not just non-blocking mode. I suggest you review your reading matter.
thus a send() call may return immediately indicating that all the data has been sent, where in reality it was only copied to the kernel's buffer.
Again that is true in all modes.
How do I know when the data was actually sent and received by the remote peer, for knowing the real transfer rate?
When you've sent all the data, shutdown the socket for output, then either set blocking mode and read, or keep selecting for 'readable'; and then in either case read the EOS that should result. That functions as a peer acknowledgement of the close. Then stop the timer.
send() merely puts data into the kernel's buffer and then exits, letting the kernel perform the actual transmission in the background, so all you can really do is measure the speed in which the kernel is accepting your outgoing data. You can't really measure the actual transmission speed unless the peer sends an acknowledgement for every buffer received (and there is no way to detect when TCP's own acks are received). But using the fact that send() can block when too much data is still in flight can help you figure out how fact your code is passing outgoing data to send().
send() tells you how many bytes were accepted. So it is very easy to calculate an approximate acceptance speed - divide the number of bytes accepted by the amount of time elapsed since the previous call to send(). So when you call send() to send X bytes and get Y bytes returned, record the time as time1, call send() again to send X bytes and get Y bytes returned, record the time as time2, you will see that your code is sending data at roughly Y / ((time2-time1) in ms) bytes per millisecond, which you can then use to calculate B/KB/MB/GB per ms/sec/min/hr as needed. Over the lifetime of the data transfer, that gives you fairly good idea of your app's general transmission speed.