Why does setting SO_SNDBUF and SO_RCVBUF destroy performance? - c++

Running in Docker on a MacOS, I have a simple server and client setup to measure how fast I can allocate data on the client and send it to the server. The tests are done using loopback (in the same docker container). The message size for my tests was 1000000 bytes.
When I set SO_RCVBUF and SO_SNDBUF to their respective defaults, the performance halves.
SO_RCVBUF defaults to 65536 and SO_SNDBUF defaults to 1313280 (retrieved by calling getsockopt and dividing by 2).
Tests:
When I test setting neither buffer size, I get about 7 Gb/s throughput.
When I set one buffer or the other to the default (or higher) I get 3.5 Gb/s.
When I set both buffer sizes to the default I get 2.5 Gb/s.
Server code: (cs is an accepted stream socket)
void tcp_rr(int cs, uint64_t& processed) {
/* I remove this entire thing and performance improves */
if (setsockopt(cs, SOL_SOCKET, SO_RCVBUF, &ENV.recv_buf, sizeof(ENV.recv_buf)) == -1) {
perror("RCVBUF failure");
return;
}
char *buf = (char *)malloc(ENV.msg_size);
while (true) {
int recved = 0;
while (recved < ENV.msg_size) {
int recvret = recv(cs, buf + recved, ENV.msg_size - recved, 0);
if (recvret <= 0) {
if (recvret < 0) {
perror("Recv error");
}
return;
}
processed += recvret;
recved += recvret;
}
}
free(buf);
}
Client code: (s is a connected stream socket)
void tcp_rr(int s, uint64_t& processed, BenchStats& stats) {
/* I remove this entire thing and performance improves */
if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, &ENV.send_buf, sizeof(ENV.send_buf)) == -1) {
perror("SNDBUF failure");
return;
}
char *buf = (char *)malloc(ENV.msg_size);
while (stats.elapsed_millis() < TEST_TIME_MILLIS) {
int sent = 0;
while (sent < ENV.msg_size) {
int sendret = send(s, buf + sent, ENV.msg_size - sent, 0);
if (sendret <= 0) {
if (sendret < 0) {
perror("Send error");
}
return;
}
processed += sendret;
sent += sendret;
}
}
free(buf);
}
Zeroing in on SO_SNDBUF:
The default appears to be: net.ipv4.tcp_wmem = 4096 16384 4194304
If I setsockopt to 4194304 and getsockopt (to see what's currently set) it returns 425984 (10x less than I requested).
Additionally, it appears a setsockopt sets a lock on buffer expansion (for send, the lock's name is SOCK_SNDBUF_LOCK which prohibits adaptive expansion of the buffer). The question then is - why can't I request the full size buffer?

Clues for what is going on come from the kernel source handle for SO_SNDBUF (and SO_RCVBUF but we'll focus on SO_SNDBUF below).
net/core/sock.c contains implementations for the generic socket operations, including the SOL_SOCKET getsockopt and setsockopt handles.
Examining what happens when we call setsockopt(s, SOL_SOCKET, SO_SNDBUF, ...):
case SO_SNDBUF:
/* Don't error on this BSD doesn't and if you think
* about it this is right. Otherwise apps have to
* play 'guess the biggest size' games. RCVBUF/SNDBUF
* are treated in BSD as hints
*/
val = min_t(u32, val, sysctl_wmem_max);
set_sndbuf:
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
sk->sk_sndbuf = max_t(int, val * 2, SOCK_MIN_SNDBUF);
/* Wake up sending tasks if we upped the value. */
sk->sk_write_space(sk);
break;
case SO_SNDBUFFORCE:
if (!capable(CAP_NET_ADMIN)) {
ret = -EPERM;
break;
}
goto set_sndbuf;
Some interesting things pop out.
First of all, we see that the max possible value is sysctl_wmem_max, a setting which is difficult to pin down within a docker container. We know from the context above that this is likely 212992 (half your max value you retrieved after trying to set 4194304).
Secondly, we see SOCK_SNDBUF_LOCK being set. This setting is in my opinion not well documented in the man pages, but it appears to lock dynamic adjustment of the buffer size.
For example, in the function tcp_should_expand_sndbuf we get:
static bool tcp_should_expand_sndbuf(const struct sock *sk)
{
const struct tcp_sock *tp = tcp_sk(sk);
/* If the user specified a specific send buffer setting, do
* not modify it.
*/
if (sk->sk_userlocks & SOCK_SNDBUF_LOCK)
return false;
...
So what is happening in your code? You attempt to set the max value as you understand it, but it's being truncated to something 10x smaller by the sysctl sysctl_wmem_max. This is then made far worse by the fact that setting this option now locks the buffer to that smaller size. The strange part is that apparently dynamically resizing the buffer doesn't have this same restriction, but can go all the way to the max.
If you look at the first code snip above, you see the SO_SNDBUFFORCE option. This will disregard the sysctl_wmem_max and allow you to set essentially any buffer size provided you have the right permissions.
It turns out processes launched in generic docker containers don't have CAP_NET_ADMIN, so in order to use this socket option, you must run in --privileged mode. However, if you do, and if you force the max size, you will see your benchmark return the same throughput as not setting the option at all and allowing it to grow dynamically to the same size.

Related

getsockopt() returns a value twice the value that was previously set by setsockopt()

I am attempting to increase the SO_RCVBUF for a raw socket I am using to interact with a linux device driver. The default rmem_default/rmem_max are both too small, 163840. Therefore I am using the following stack overflow question/answers to help me. Everything is working, or least it looks like it. However, when I get the value I set for the SO_RCVBUF it returns the value I set * 2? Anyone know why that is?
int recv_val = SOCK_RCV_BUF_MAX; socklen_t size = sizeof(recv_val);
if(setsockopt(sock_fd, SOL_SOCKET, SO_RCVBUF, &recv_val, size) < 0)
{
fprintf(stderr, "Error setsockopt(SO_RCVBUF): %s\n", strerror(errno));
}
else
printf("Set the SO_RCVBUF to %d\n", recv_val);
recv_val = 0;
if (getsockopt(sock_fd, SOL_SOCKET, SO_RCVBUF, &recv_val, &size) < 0)
{
fprintf(stderr, "Error getsockopt(SO_RCVBUF): %s\n", strerror(errno));
}
else if(recv_val == SOCK_RCV_BUF_MAX)
{
printf("Successfully set the buffer max to %d\n", SOCK_RCV_BUF_MAX);
}
else
printf("Failed to set the buffer to max (%d), val = %d\n", SOCK_RCV_BUF_MAX, recv_val);
Output
Set the SO_RCVBUF to 64000000
Failed to set the buffer to max (64000000), val = 128000000
Changing to recv_val = SOCK_RCV_BUF_MAX/2 outputs
Set the SO_RCVBUF to 32000000
Successfully set the buffer max to 64000000
If I don't set the value using setsockopt() and call getsockopt() for my socket I get the correct default value
Failed to set the buffer to max (64000000), val = 163840
The value you give to setsockopt(SO_RCVBUF) is only a hint, not an absolute. The socket provider is allowed to use a different value if it wants to. What you get back from getsockopt(SO_RCVBUF) is the actual value used.
What you are seeing happen is actually documented behavior:
http://man7.org/linux/man-pages/man7/socket.7.html
SO_RCVBUF
Sets or gets the maximum socket receive buffer in bytes. The kernel doubles this value (to allow space for bookkeeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/rmem_default file, and the maximum allowed value is set by the /proc/sys/net/core/rmem_max file. The minimum (doubled) value for this option is 256.

Does v4l2 camera capture with mmap ring buffer make sense for tracking application

I'm working on a v4l2 API for capturing images from a raw sensor on embedded platform. My capture routine is related to the example on [1]. The proposed method for streaming is using mmaped buffers as a ringbuffer.
For initialization, buffers (default = 4 buffers) are requested using ioctl with VIDIOC_REQBUFS identifier. Subsequently, they are queued using VIDIOC_QBUF. The entire streaming procedure is described here [2]. As soon as streaming starts, the driver fills the queued buffers with data. The timestamp of v4l2_buffer struct indicates the time of first byte captured which in my case results in a time interval of approximately 8.3 ms (=120fps) between buffers. So far so good.
Now what I would expect of a ring buffer is that new captures automatically overwrite older ones in a circular fashion. But this is not what happens. Only when a buffer is queued again (VIDIOC_QBUF) after it has been dequeued (VIDIOC_DQBUF) and processed (demosaic, tracking step,..), a new frame is assigned to a buffer. If I do meet the timing condition (processing < 8.3 ms) I don't get the latest captured frame when dequeuing but the oldest captured one (according to FIFO), so the one of 3x8.3 ms before the current one. If the timing condition is not met the time span gets even larger, as the buffers are not overwritten.
So I have several questions:
1. Does it even make sense for this tracking application to have a ring buffer as I don't really need history of frames? I certainly doubt that, but by using the proposed mmap method drivers mostly require a minimum amount of buffers to be requested.
2. Should a seperate thread continously DQBUF and QBUF to accomplish the buffer overwrite? How could this be accomplished?
3. As a workaround one could probably dequeue and requeue all buffers on every capture, but this doesn't sound right. Is there someone with more experience in real time capture and streaming who can point to the "proper" way to go?
4. Also currently, I'm doing the preprocessing step (demosaicing) between DQBUF and QBUF and the tracking step afterwards. Should the tracking step also be executed before QBUF is called again?
So the main code basically performs Capture() and Track() subsequently in a while loop. The Capture routine looks as follows:
cv::Mat v4l2Camera::Capture( size_t timeout ) {
fd_set fds;
FD_ZERO(&fds);
FD_SET(mFD, &fds);
struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 0;
const bool threaded = true; //false;
// proper register settings
this->format2registerSetting();
//
if( timeout > 0 )
{
tv.tv_sec = timeout / 1000;
tv.tv_usec = (timeout - (tv.tv_sec * 1000)) * 1000;
}
//
const int result = select(mFD + 1, &fds, NULL, NULL, &tv);
if( result == -1 )
{
//if (EINTR == errno)
printf("v4l2 -- select() failed (errno=%i) (%s)\n", errno, strerror(errno));
return cv::Mat();
}
else if( result == 0 )
{
if( timeout > 0 )
printf("v4l2 -- select() timed out...\n");
return cv::Mat(); // timeout, not necessarily an error (TRY_AGAIN)
}
// dequeue input buffer from V4L2
struct v4l2_buffer buf;
memset(&buf, 0, sizeof(v4l2_buffer));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP; //V4L2_MEMORY_USERPTR;
if( xioctl(mFD, VIDIOC_DQBUF, &buf) < 0 )
{
printf("v4l2 -- ioctl(VIDIOC_DQBUF) failed (errno=%i) (%s)\n", errno, strerror(errno));
return cv::Mat();
}
if( buf.index >= mBufferCountMMap )
{
printf("v4l2 -- invalid mmap buffer index (%u)\n", buf.index);
return cv::Mat();
}
// emit ringbuffer entry
printf("v4l2 -- recieved %ux%u video frame (index=%u)\n", mWidth, mHeight, (uint32_t)buf.index);
void* image_ptr = mBuffersMMap[buf.index].ptr;
// frame processing (& tracking step)
cv::Mat demosaic_mat = demosaic(image_ptr,mSize,mDepth,1);
// re-queue buffer to V4L2
if( xioctl(mFD, VIDIOC_QBUF, &buf) < 0 )
printf("v4l2 -- ioctl(VIDIOC_QBUF) failed (errno=%i) (%s)\n", errno, strerror(errno));
return demosaic_mat;
}
As my knowledge is limited regarding capturing and streaming video I appreciate any help.

Windws C++ Intermittent Socket Disconnect

I've got a server that uses a two thread system to manage between 100 and 200 concurrent connections. It uses TCP sockets, as packet delivery guarantee is important (it's a communication system where missed remote API calls could FUBAR a client).
I've implemented a custom protocol layer to separate incoming bytes into packets and dispatch them properly (the library is included below). I realize the issues of using MSG_PEEK, but to my knowledge, it is the only system that will fulfill the needs of the library implementation. I am open to suggestions, especially if it could be part of the problem.
Basically, the problem is that, randomly, the server will drop the client's socket due to a lack of incoming packets for more than 20 seconds, despite the client successfully sending a keepalive packet every 4. I can verify that the server itself didn't go offline and that the connection of the users (including myself) experiencing the problem is stable.
The library for sending/receiving is here:
short ncsocket::send(wstring command, wstring data) {
wstringstream ss;
int datalen = ((int)command.length() * 2) + ((int)data.length() * 2) + 12;
ss << zero_pad_int(datalen) << L"|" << command << L"|" << data;
int tosend = datalen;
short __rc = 0;
do{
int res = ::send(this->sock, (const char*)ss.str().c_str(), datalen, NULL);
if (res != SOCKET_ERROR)
tosend -= res;
else
return FALSE;
__rc++;
Sleep(10);
} while (tosend != 0 && __rc < 10);
if (tosend == 0)
return TRUE;
return FALSE;
}
short ncsocket::recv(netcommand& nc) {
vector<wchar_t> buffer(BUFFER_SIZE);
int recvd = ::recv(this->sock, (char*)buffer.data(), BUFFER_SIZE, MSG_PEEK);
if (recvd > 0) {
if (recvd > 8) {
wchar_t* lenstr = new wchar_t[4];
memcpy(lenstr, buffer.data(), 8);
int fulllen = _wtoi(lenstr);
delete lenstr;
if (fulllen > 0) {
if (recvd >= fulllen) {
buffer.resize(fulllen / 2);
recvd = ::recv(this->sock, (char*)buffer.data(), fulllen, NULL);
if (recvd >= fulllen) {
buffer.resize(buffer.size() + 2);
buffer.push_back((char)L'\0');
vector<wstring> data = parsewstring(L"|", buffer.data(), 2);
if (data.size() == 3) {
nc.command = data[1];
nc.payload = data[2];
return TRUE;
}
else
return FALSE;
}
else
return FALSE;
}
else
return FALSE;
}
else {
::recv(this->sock, (char*)buffer.data(), BUFFER_SIZE, NULL);
return FALSE;
}
}
else
return FALSE;
}
else
return FALSE;
}
This is the code for determining if too much time has passed:
if ((int)difftime(time(0), regusrs[i].last_recvd) > SERVER_TIMEOUT) {
regusrs[i].sock.end();
regusrs[i].is_valid = FALSE;
send_to_all(L"removeuser", regusrs[i].server_user_id);
wstringstream log_entry;
log_entry << regusrs[i].firstname << L" " << regusrs[i].lastname << L" (suid:" << regusrs[i].server_user_id << L",p:" << regusrs[i].parent << L",pid:" << regusrs[i].parentid << L") was disconnected due to idle";
write_to_log_file(server_log, log_entry.str());
}
The "regusrs[i]" is using the currently iterated member of a vector I use to story socket descriptors and user information. The 'is_valid' check is there to tell if the associated user is an actual user - this is done to prevent the system from having to deallocate the member of the vector - it just returns it to the pool of available slots. No thread access/out-of-range issues that way.
Anyway, I started to wonder if it was the server itself was the problem. I'm testing on another server currently, but I wanted to see if another set of eyes could stop something out of place or cue me in on a concept with sockets and extended keepalives that I'm not aware of.
Thanks in advance!
I think I see what you're doing with MSG_PEEK, where you wait until it looks like you have enough data to read a full packet. However, I would be suspicious of this. (It's hard to determine the dynamic behaviour of your system just by looking at this small part of the source and not the whole thing.)
To avoid use of MSG_PEEK, follow these two principles:
When you get a notification that data is ready (I assume you're using select), then read all the waiting data from recv(). You may use more than one recv() call, so you can handle the incoming data in pieces.
If you read only a partial packet (length or payload), then save it somewhere for the next time you get a read notification. Put the packets and payloads back together yourself, don't leave them in the socket buffer.
As an aside, the use of new/memcpy/wtoi/delete is woefully inefficient. You don't need to allocate memory at all, you can use a local variable. And then you don't even need the memcpy at all, just a cast.
I presume you already assume that your packets can be no longer than 999 bytes in length.

Linux poll on serial transmission end

I'm implementing RS485 on arm developement board using serial port and gpio for data enable.
I'm setting data enable to high before sending and I want it to be set low after transmission is complete.
It can be simply done by writing:
//fd = open("/dev/ttyO2", ...);
DataEnable.Set(true);
write(fd, data, datalen);
tcdrain(fd); //Wait until all data is sent
DataEnable.Set(false);
I wanted to change from blocking-mode to non-blocking and use poll with fd. But I dont see any poll event corresponding to 'transmission complete'.
How can I get notified when all data has been sent?
System: linux
Language: c++
Board: BeagleBone Black
I don't think it's possible. You'll either have to run tcdrain in another thread and have it notify the the main thread, or use timeout on poll and poll to see if the output has been drained.
You can use the TIOCOUTQ ioctl to get the number of bytes in the output buffer and tune the timeout according to baud rate. That should reduce the amount of polling you need to do to just once or twice. Something like:
enum { writing, draining, idle } write_state;
while(1) {
int write_event, timeout = -1;
...
if (write_state == writing) {
poll_fds[poll_len].fd = write_fd;
poll_fds[poll_len].event = POLLOUT;
write_event = poll_len++
} else if (write == draining) {
int outq;
ioctl(write_fd, TIOCOUTQ, &outq);
if (outq == 0) {
DataEnable.Set(false);
write_state = idle;
} else {
// 10 bits per byte, 1000 millisecond in a second
timeout = outq * 10 * 1000 / baud_rate;
if (timeout < 1) {
timeout = 1;
}
}
}
int r = poll(poll_fds, poll_len, timeout);
...
if (write_state == writing && r > 0 && (poll_fds[write_event].revent & POLLOUT)) {
DataEnable.Set(true); // Gets set even if already set.
int n = write(write_fd, write_data, write_datalen);
write_data += n;
write_datalen -= n;
if (write_datalen == 0) {
state = draining;
}
}
}
Stale thread, but I have been working on RS-485 with a 16550-compatible UART under Linux and find
tcdrain works - but it adds a delay of 10 to 20 msec. Seems to be polled
The value returned by TIOCOUTQ seems to count bytes in the OS buffer, but NOT bytes in the UART FIFO, so it may underestimate the delay required if transmission has already started.
I am currently using CLOCK_MONOTONIC to timestamp each send, calculating when the send should be complete, when checking that time against the next send, delaying if necessary. Sucks, but seems to work

Boost asio tcp socket available reports incorrect number of bytes

In SSL client server model, I use the code below to read data from the socket on either client or server side.
I only read data when there is data available. To know when there is data available, I check the available() method on the lowest_layer() of the asio::ssl::stream.
After I send 380 bytes from the client to the server and enter the read method on the server, I see the following.
‘s’ is the buffer I supplied.
‘n’ is the size of the buffer I supplied.
‘a1’ is the result of available() before the read and will report 458 bytes.
‘r’ is the number of bytes actually read. It will report 380, which is correct.
‘a2’ is the result of available() after the read and will report 0 bytes. This is what I expect, since my client sent 380 bytes and I have read them all.
Why does the first call to available() report too many bytes?
Types:
/**
* Type used as SSL Socket. Handles SSL and socket functionality.
*/
typedef boost::asio::ssl::stream<boost::asio::ip::tcp::socket> SslSocket;
/**
* A shared pointer version of the SSL Socket type.
*/
typedef boost::shared_ptr<SslSocket> ShpSslSocket;
Members:
ShpSslSocket m_shpSecureSocket;
Part of the read method:
std::size_t a1 = 0;
if ((a1 = m_shpSecureSocket->lowest_layer().available()) > 0)
{
r += boost::asio::read(*m_shpSecureSocket,
boost::asio::buffer(s, n),
boost::asio::transfer_at_least(1));
}
std::size_t a2 = m_shpSecureSocket->lowest_layer().available();
Added info:
So I changed my read method to more thoroughly check if there is still data available to be read from the boost::asio::ssl::stream. Not only do I need to check if there is data available on the socket level, but there may also be data stuck in the OpenSSL buffers somewhere. SSL_peek does the trick. Next to checking for available data, it also checks the TCP port status and does all this as long as there is no timeout.
Here is the complete read method of the boost::iostreams::device class that I created.
std::streamsize SslClientSocketDevice::read(char* s, std::streamsize n)
{
// Request from the stream/device to receive/read bytes.
std::streamsize r = 0;
LIB_PROCESS::TcpState eActualState = LIB_PROCESS::TCP_NOT_EXIST;
char chSslPeekBuf; // 1 byte peek buffer
// Check that there is data available. If not, wait for it.
// Check is on the lowest layer (tcp). In that layer the data is encrypted.
// The number of encrypted bytes is most often different than the number
// of unencrypted bytes that would be read from the secure socket.
// Also: Data may be read by OpenSSL from the socket and remain in an
// OpenSSL buffer somewhere. We also check that.
boost::posix_time::ptime start = BOOST_UTC_NOW;
int nSslPeek = 0;
std::size_t nAvailTcp = 0;
while ((*m_shpConnected) &&
(LIB_PROCESS::IpMonitor::CheckPortStatusEquals(GetLocalEndPoint(),
GetRemoteEndPoint(),
ms_ciAllowedStates,
eActualState)) &&
((nAvailTcp = m_shpSecureSocket->lowest_layer().available()) == 0) &&
((nSslPeek = SSL_peek(m_shpSecureSocket->native_handle(), &chSslPeekBuf, 1)) <= 0) && // May return error (<0) as well
((start + m_oReadTimeout) > BOOST_UTC_NOW))
{
boost::this_thread::sleep(boost::posix_time::millisec(10));
}
// Always read data when there is data available, even if the state is no longer valid.
// Data may be reported by the TCP socket (num encrypted bytes) or have already been read
// by SSL and not yet returned to us.
// Remote party can have sent data and have closed the socket immediately.
if ((nAvailTcp > 0) || (nSslPeek > 0))
{
r += boost::asio::read(*m_shpSecureSocket,
boost::asio::buffer(s, n),
boost::asio::transfer_at_least(1));
}
// Close socket when state is not valid.
if ((eActualState & ms_ciAllowedStates) == 0x00)
{
LOG4CXX_INFO(LOG4CXX_LOGGER, "TCP socket not/no longer connected. State is: " <<
LIB_PROCESS::IpMonitor::TcpStateToString(eActualState));
LOG4CXX_INFO(LOG4CXX_LOGGER, "Disconnecting socket.");
Disconnect();
}
if (! (*m_shpConnected))
{
if (r == 0)
{
r = -1; // Signal stream is closed if no data was retrieved.
ThrowExceptionStreamFFL("TCP socket not/no longer connected.");
}
}
return r;
}
So maybe I know why this is. It is an SSL connection and therfor the transfered bytes will be encrypted. Encrypted data may well be of a different size because of the block size. I guess that answers the question why the number of bytes available on TCP level is different than the number of bytes that comes out of a read.