MPI_ERR_BUFFER: invalid buffer pointer - c++

What is the most common reason for this error
MPI_ERR_BUFFER: invalid buffer pointer
which results from MPI_Bsend() and MPI_Rcev() calls?
The program works fine when the number of parallel processes is small (<14), but when I increase the number of processes I get this error.

To expand on my previous comment:
Buffering in MPI can occur on various occasions. Messages can be buffered internally by the MPI library in order to hide the network latency (usually only done for small messages up to an implementation-dependent size) or buffering can be enforced by the user by using any of the buffered send operations MPI_Bsend() and MPI_Ibsend(). User buffering differs from the internal one though:
first, messages sent by MPI_Bsend() or by MPI_Ibsend() are always buffered, which is not the case with internally buffered messages. The latter can either be buffered or not depending on their size and the availability of internal buffer space;
second, because of the "always buffer" aspect, if no buffer space is available in the user attached buffer, an MPI_ERR_BUFFER error occurs.
Sent messages use buffer space until they are received for sure by the destionation process. Since MPI does not provide any built-in mechanisms to confirm the reception of a message, one has to devise another way to do it, e.g. by sending back a confirmation messages from the destination process to the source one.
For that reason one has to consider all messages that were not explicitly confirmed as being in transit and has to allocate enough memory in the buffer. Usually this means that the buffer should be at least as large as the total amount of data that you are willing to transfer plus the message envelope overhead which is equal to number_of_sends * MPI_BSEND_OVERHEAD. This can put a lot of memory pressure for large MPI jobs. One has to keep that in mind and adjust the buffer space accordingly when the number of processes is changed.
Note that buffered send is provided merely as a convenience. It could be readily implemented as a combination of memory duplication and non-blocking send operation, e.g. buffered send frees you from writing code like:
int data[];
int *shadow_data;
MPI_Request req;
...
<populate data>
...
shadow_data = (int *)malloc(sizeof(data));
memcpy(shadow_data, data, sizeof(data));
MPI_Isend(shadow_data, count, MPI_INT, destination, tag, MPI_COMM_WORLD, &req);
...
<reuse data as it is not used by MPI>
...
MPI_Wait(&req);
free(shadow_data);
If memory is scarce then you should resort to non-blocking sends only.

Related

How to recive more than 65000 bytes in C++ socket using recv()

I am developing a client server application (TCP) in Linux using C++. I want to send more than 65,000 bytes at the same time. In TCP, the maximum packet size is 65,535 bytes only.
How can I send the entire bytes without loss?
Following is my code at server side.
//Receive the message from client socket
if((iByteCount = recv(GetSocketId(), buffer, MAXRECV, MSG_WAITALL)) > 0)
{
printf("\n Received bytes %d\n", iByteCount);
SetReceivedMessage(buffer);
return LS_RESULT_OK;
}
If I use MSG_WAITALL it takes a long time to receive the bytes so how can I set the flag to receive more than 1 million bytes at time.
Edit: The MTU size is 1500 bytes but the absolute limitation on TCP Packet size if 65,535.
Judging from the comments above, it seems you don't understand how recv works, or how it is supposed to be used.
You really want to call recv in a loop, until either you know that the expected amount of data has been received or until you get a "zero bytes read" result, which means the other end has closed the connection. Always, no exceptions.
If you need to do other things concurrently (likely, with a server process!) then you will probably want to check descriptor readiness with poll or epoll first. That lets you multiplex sockets as they become ready.
The reason why you want to do it that way, and never any different, is that you don't know how the data will be packeted and how (or when) packets will arrive. Plus, recv gives no guarantee about the amount of data read at a time. It will offer what it has in its buffers at the time you call it, no more and no less (it may block if there's nothing, but then you still don't have a guarantee that any particular amount of data will be returned when it resumes, it may still return e.g. 50 bytes!).
Even if you only send, say, 5,000 bytes total, it is perfectly valid behaviour for TCP to break this into 5 (or 10, or 20) packets, and for recv to return 500 (or 100, or 20, or 1) bytes at a time, every time you call it. That's just how it works.
TCP guarantees that anything you send will eventually arrive at the other end or produce an error. And, it guarantees that whatever you send arrives in order. It does not guarantee much else. Above all, it does not guarantee that any particular amount of data is ready at any given time.
You must be prepared for that, and the only way to do it is calling recv repeatedly. Otherwise you will always lose data under some circumstances.
MSG_WAITALL should in principle make it work the way you expect, but that is bad behaviour, and it is not guaranteed to work. If the socket (or some other structure in the network stack) runs against a soft or hard limit, it may not, and probably will not fulfill your request. Some limits are obscure, too. For example, the number for SO_RCVBUF must be twice as large as what you expect to receive under Linux, because of implementation details.
Correct behaviour of a server application should never depend on assumptions such as "it fits into the receive buffer". Your application needs to be prepared, in principle, to receive terabytes of data using a 1 kilobyte receive buffer, and in chunks of 1 byte at a time, if need be. A larger receive buffer will make it more efficient, but that's it... it still has to work either way.
The fact that you only seee failures upwards of some "huge" limit is just luck (or rather, bad luck). The fact that it apparently "works fine" up to that limit suggests what you do is correct, but it isn't. It's an unlucky coincidence that it works.
EDIT:
As requested in below comment, here is what this could look like (Code is obviously untested, caveat emptor.)
std::vector<char> result;
int size;
char recv_buf[250];
for(;;)
{
if((size = recv(fd, recv_buf, sizeof(recv_buf), 0)) > 0)
{
for(unsigned int i = 0; i < size; ++i)
result.push_back(recv_buf[i]);
}
else if(size == 0)
{
if(result.size() < expected_size)
{
printf("premature close, expected %u, only got %u\n", expected_size, result.size());
}
else
{
do_something_with(result);
}
break;
}
else
{
perror("recv");
exit(1);
}
}
That will receive any amount of data you want (or until operator new throws bad_alloc after allocating a vector several hundred MiB in size, but that's a different story...).
If you want to handle several connections, you need to add poll or epoll or kqueue or a similar functionality (or... fork), I'll leave this as exercise for the reader.
It is possible that your problem is related to kernel socket buffer sizes. Try adding the following to your code:
int buffsize = 1024*1024;
setsockopt(s, SOL_SOCKET, SO_RCVBUF, &buffsize, sizeof(buffsize));
You might need to increase some sysctl variables too:
sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
Note however, that relying on TCP to fill your whole buffer is generally a bad idea. You should rather call recv() multiple times. The only good reason why you would want to receive more than 64K is for improved performance. However, Linux should already have auto-tuning that will progressively increase the buffer sizes as required.
in tcp max packet sixe is 65,635,bytes
No it isn't. TCP is a byte-stream protocol over segments over IP packets, and the protocol has unlimited transmission sizes over any one connection. Look at all those 100MB downloads: how do you think they work?
Just send and receive the data. You'll get it.
I would suggest exploring kqueue or something similar. With event notification there is no need to loop on recv . Just call a simple read function upon an EV_READ event and use a single call to the recv function upon the socket that triggered the event. Your function can have a buffer size of 10 bytes or however much you want it doesn't matter because if you did not read the entire message the first time around you'll just get another EV_READ event on the socket and you recall your read function. When the data is read you'll get a EOF event. No need to hustle with loops that may or may not block other incoming connections.

WSARecv, Completionport Model, how to manage Buffer and avoid overruns?

My Problem: My Completionport Server will receive Data of unknown size from different clients, the thing is, that i don't know how avoid buffer overruns/ how to avoid my (receiving) buffer being "overfilled" with data.
now to the Quesitons:
1) If i make a receive call via WSARecv, does the workerthread work like a callback function ? I mean, does it dig up the receive call only when it has completed or does it also dig it up when the receiving is happening ? Does the lpNumberOfBytes (from GetQueuedCompletionStatus) variable contain the number of bytes received till now or the total number of bytes received ?
2) How to avoid overruns, i thought of dynamically allocated buffer structures, but then again, how do i find out how big the package is going to get ?
edit: i hate to ask this, but is there any "simple" method for managing the buffer and to avoid overruns ? synchronisations sounds off limit to me, atleast right now
If i make a receive call via WSARecv, does the workerthread work like a callback function ?
See #valdo post. Completion data si queued to your pool of threads and one will be made ready to process it.
'I mean, does it dig up the receive call only when it has completed?' Yes - hence the name. Note that the meaning of 'completed' may vary. depending on the protocol. With TCP, it means that some streamed data bytes have been received from the peer.
'Does the lpNumberOfBytes (from GetQueuedCompletionStatus) variable contain the number of bytes received till now or the total number of bytes received ?' It contains the number of bytes received and loaded into the buffer array provided in that IOCP completion only.
'How to avoid overruns, i thought of dynamically allocated buffer structures, but then again, how do i find out how big the package is going to get ?' You cannot get overruns if you provide the buffer arrays - the kernel thread/s that load the buffer/s will not exceed the passed buffer lengths. At application level, given the streaming nature of TCP, it's up to you to decide how to process the buffer arrays into useable application-level protocol-units. You must decide, using your knowledge of the services provided, on a suitable buffer management scheme.
Last IOCP server was somwewhat general-purpose. I used an array of buffer pools and a pool of 'buffer-carrier' objects, allocated at startup, (along with a pool of socket objects). Each buffer pool held buffers of a different size. Upon a new connection, I issued an WSARecv using one buffer from the smallest pool. If this buffer got completely filled, I used a buffer from the next largest pool for the next WSARecv, and so on.
Then there's the issue of the sequence numbers needed to prevent out-of-order buffering with multiple handler threads :(
_1. Completion port is a sort of a queue (with sophisticated logic concerning priority of threads waiting to dequeue an I/O completion from it). Whenever an I/O completes (either successfully or not), it's queued into the completion port. Then it's dequeued by one of the thread called GetQueuedCompletionStatus.
So that you never dequeue an I/O "in progress". Moreover, it's processed by your worker thread asynchronously. That is, it's delayed until your thread calls GetQueuedCompletionStatus.
_2. This is actually a complex matter. Synchronization is not a trivial task overall, especially when it comes to symmetric multi-threading (where you have several threads, each may be doing everything).
One of the parameters you receive with a completed I/O is a pointer to an OVERLAPPED structure (that you supplied to the function that issued I/O, such as WSARecv). It's a common practice to allocate your own structure that is based on OVERLAPPED (either inherits it or has it as the first member). Upon receiving a completion you may cast the dequeued OVERLAPPED to your actual data structure. There you may have everything needed for the synchronization: sync objects, state description and etc.
Note however that it's not a trivial task to synchronize things correctly (to have a good performance and avoid deadlocks) even when you have the custom context. This demands an accurate design.

Design of concurrent processing of a dual buffer system?

I have a long-running application that basically:
read packets off network
save it somewhere
process it and output to disk
A very common use-case indeed - except both the data size and data rate can be quite large. To avoid overflow of the memory and improve efficiency, I am thinking of a dual buffer design, where buffer A and B alternate: while A is holding networking packet, B is processed for output. Once buffer A reaches a soft bound, A is due for output processing, and B will be used for holding network packets.
I am not particularly experienced on concurrency/multi-thread program paradigm. I have read some past discussion on circular buffer that handle multiple-producer and multiple consumer case. I am not sure if that is the best solution and It seems the dual buffer design is simpler.
My question is: is there a design pattern I can follow to tackle the problem? or better design for that matter? If possible, please use pseudo code to help to illustrate the solution. Thanks.
I suggest that you should, instead of assuming "two" (or any fixed number of ...) buffers, simply use a queue, and therefore a "producer/consumer" relationship.
The process that is receiving packets simply adds them to a buffer of some certain size, and, either when the buffer is sufficiently full or a specified (short...) time interval has elapsed, places the (non-empty) buffer onto a queue for processing by the other. It then allocates a new buffer for its own use.
The receiving ("other...") process is woken up any time there might be a new buffer in the queue for it to process. It removes the buffer, processes it, then checks the queue again. It goes to sleep only when it finds that the queue is empty. (Use care to be sure that the process cannot decide to go to sleep at the precise instant that the other process decides to signal it... there must be no "race condition" here.)
Consider simply allocating storage "per-message" (whatever a "message" may mean to you), and putting that "message" onto the queue, so that there is no unnecessary delay in processing caused by "waiting for a buffer to fill up."
It might be worth mentioning a technique used in real-time audio processing/recording, which uses a single ring buffer (or fifo if you prefer that term) of sufficient size can be used for this case.
You will need then a read and write cursor. (Whether you actually need a lock or can do with volatile plus memory barriers is a touchy subject, but the people at portaudio suggest you do this without locks if performance is important.)
You can use one thread to read and another thread to write. The read thread should consume as much of the buffer as possible. You will be safe unless you run out of buffer space, but that exists for the dual-buffer solution as well. So the underlying assumption is that you can write to disk faster then the input comes in, or you will need to expand on the solution.
Find a producer-consumer queue class that works. Use one to create a buffer pool to improve performance and control memory use. Use another to transfer the buffers from the network thread to the disk thread:
#define CnumBuffs 128
#define CbufSize 8182
#define CcacheLineSize 128
public class netBuf{
private char cacheLineFiller[CcacheLineSize]; // anti false-sharing space
public int dataLen;
public char bigBuf[CbufSize];
};
PCqueue pool;
PCqueue diskQueue;
netThread Thread;
diskThread Thread;
pool=new(PCqueue);
diskQueue=new(PCqueue);
// make an object pool
for(i=0;i<CnumBuffs,i++){
pool->push(new(netBuf));
};
netThread=new(netThread);
diskThread=new(diskThread);
netThread->start();
diskThread->start();
..
void* netThread.run{
netbuf *thisBuf;
for(;;){
pool->pop(&thisBuf}; // blocks if pool empty
netBuf->datalen=network.read(&thisBuf.bigBuf,sizeof(thisBuf.bigBuf));
diskQueue->push(thisBuf);
};
};
void* diskThread.run{
fileStream *myFile;
diskBuf *thisBuf;
new myFile("someFolder\fileSpec",someEnumWrite);
for(;;){
diskQueue->pop(&thisBuf}; // blocks until buffer available
myFile.write(&thisBuf.bigBuf,thisBuf.dataLen);
pool->push(thisBuf};
};
};

Buffering Incomplete High Speed Reads

I am reading data ~100 bytes at 100hz from a serial port. My buffer is 1024 bytes, so often my buffer doesn't get completely used. Sometimes however, I get hiccups from the serial port and the buffer gets filled up.
My data is organized as a [header]data[checksum]. When my buffer gets filled up, sometimes a message/data is split across two reads from the serial port.
This is a simple problem, and I'm sure there are a lot of different approaches. I am ahead of schedule so I would like to research different approaches. Could you guys name some paradigms that cover buffering in high speed data that might need to be put together from two reads? Note, the main difference I see in this problem from say other buffering I've done (image acquisition, tcp/ip), is that there we are guaranteed full packets/messages. Here a "packet" may be split between reads, which we will only know once we start parsing the data.
Oh yes, note that the data buffered in from the read has to be parsed, so to make things simple, the data should be contiguous when it reaches the parsing. (Plus I don't think that's the parser's responsibility)
Some Ideas I Had:
Carry over unused bytes to my original buffer, then fill it with the read after the left over bytes from the previous read. (For example, we read 1024 bytes, 24 bytes are left at the end, they're a partial message, memcpy to the beginning of the read_buffer_, pass the beginning + 24 to read and read in 1024 - 24)
Create my own class that just gets blocks of data. It has two pointers, read/write and a large chunk of memory (1024 * 4). When you pass in the data, the class updates the write pointer correctly, wraps around to the beginning of its buffer when it reaches the end. I guess like a ring buffer?
I was thinking maybe using a std::vector<unsigned char>. Dynamic memory allocation, guaranteed to be contiguous.
Thanks for the info guys!
Define some 'APU' application-protocol-unit class that will represent your '[header]data[checksum]'. Give it some 'add' function that takes a char parameter and returns a 'valid' bool. In your serial read thread, create an APU and read some data into your 1024-byte buffer. Iterate the data in the buffer, pushing it into the APU add() until either the APU add() function returns true or the iteration is complete. If the add() returns true, you have a complete APU - queue it off for handling, create another one and start add()-ing the remaining buffer bytes to it. If the iteration is complete, loop back round to read more serial data.
The add() method would use a state-machine, or other mechanism, to build up and check the incoming bytes, returning 'true' only in the case of a full sanity-checked set of data with the correct checksum. If some part of the checking fails, the APU is 'reset' and waits to detect a valid header.
The APU could maybe parse the data itself, either byte-by-byte during the add() data input, just before add() returns with 'true', or perhaps as a separate 'parse()' method called later, perhaps by some other APU-processing thread.
When reading from a serial port at speed, you typically need some kind of handshaking mechanism to control the flow of data. This can be hardware (e.g. RTS/CTS), software (Xon/Xoff), or controlled by a higher level protocol. If you're reading a large amount of data at speed without handshaking, your UART or serial controller needs to be able to read and buffer all the available data at that speed to ensure no data loss. On 16550 compatible UARTs that you see on Windows PCs, this buffer is just 14 bytes, hence the need for handshaking or a real time OS.

About recv and the read buffer - C Berkeley Sockets

I am using berkeley sockets and TCP (SOCK_STREAM sockets).
The process is:
I connect to a remote address.
I send a message to it.
I receive a message from it.
Imagine I am using the following buffer:
char recv_buffer[3000];
recv(socket, recv_buffer, 3000, 0);
Questions are:
How can I know if after calling recv first time the read buffer is empty or not? If it's not empty I would have to call recv again, but if I do that when it's empty I would have it blocking for much time.
How can I know how many bytes I have readed into recv_buffer? I can't use strlen because the message I receive can contain null bytes.
Thanks.
How can I know if after calling recv
first time the read buffer is empty or
not? If it's not empty I would have to
call recv again, but if I do that when
it's empty I would have it blocking
for much time.
You can use the select or poll system calls along with your socket descriptor to tell if there is data waiting to be read from the socket.
However, usually there should be an agreed-upon protocol that both sender and receiver follow, so that both parties know how much data is to be transferred. For example, perhaps the sender first sends a 2-byte integer indicating the number of bytes it will send. The receiver then first reads this 2-byte integer, so that it knows how many more bytes to read from the socket.
Regardless, as Tony pointed out below, a robust application should use a combination of length-information in the header, combined with polling the socket for additional data before each call to recv, (or using a non-blocking socket). This will prevent your application from blocking in the event that, for example, you know (from the header) that there should still be 100 bytes remaining to read, but the peer fails to send the data for whatever reason (perhaps the peer computer was unexpectedly shut off), thus causing your recv call to block.
How can I know how many bytes I have
readed into recv_buffer? I can't use
strlen because the message I receive
can contain null bytes.
The recv system call will return the number of bytes read, or -1 if an error occurred.
From the man page for recv(2):
[recv] returns the number of bytes
received, or -1 if an error occurred.
The return value will be 0 when the
peer has performed an orderly
shutdown.
How can I know if after calling recv first time the read buffer is empty or not?
Even the first time (after accepting a client), the recv can block and fail if the client connection has been lost. You must either:
use select or poll (BSD sockets) or some OS-specific equivalent, which can tell you whether there is data available on specific socket descriptors (as well as exception conditions, and buffer space you can write more output to)
you can set the socket to be nonblocking, such that recv will only return whatever is immediately available (possibly nothing)
you can create a thread that you can afford to have block recv-ing data, knowing other threads will be doing the other work you're concerned to continue with
How can I know how many bytes I have readed into recv_buffer? I can't use strlen because the message I receive can contain null bytes.
recv() returns the number of bytes read, or -1 on error.
Note that TCP is a byte stream protocol, which means that you're only guaranteed to be able to read and write bytes from it in the correct order, but the message boundaries are not guaranteed to be preserved. So, even if the sender has made a large single write to their socket, it can be fragmented en route and arrive in several smaller blocks, or several smaller send()/write()s can be consolidated and retrieved by one recv()/read().
For that reason, make sure you loop calling recv until you either get all the data you need (i.e. a complete logical message you can process) or an error. You should be prepared/able to handle getting part/all of subsequent sends from your client (if you don't have a protocol where each side only sends after getting a complete message from the other, and are not using headers with message lengths). Note that doing recvs for the message header (with length) then the body can result in a lot more calls to recv(), with a potential adverse affect on performance.
These reliability issues are often ignored. They manifest less often when on a single host, a reliable and fast LAN, with less routers and switches involved, and fewer or non-concurrent messages. Then they may break under load and over more complex networks.
If the recv() returns fewer than 3000 bytes, then you can assume that the read buffer was empty. If it returns 3000 bytes in your 3000 byte buffer, then you'd better know whether to continue. Most protocols include some variation on TLV - type, length, value. Each message contains an indicator of the type of message, some length (possibly implied by the type if the length is fixed), and the value. If, on reading through the data you did receive, you find that the last unit is incomplete, you can assume there is more to be read. You can also make the socket into a non-blocking socket; then the recv() will fail with EAGAIN or EWOULDBLOCK if there is no data read for reading.
The recv() function returns the number of bytes read.
ioctl() with the FIONREAD option tells you how much data can currently be read without blocking.