Better way to handle read buffer in C++ - c++

I am working a linux server using epoll. I have this code to read buffer
int str_len = read(m_events[i].data.fd, buf, BUF_SIZE);
if (str_len == 0) {
if (!removeClient(m_events[i].data.fd))
break;
close(m_events[i].data.fd);
} else {
char *pdata = buf;
pushWork(pdata);
}
buf is declared like this
buf[BUF_SIZE]
pushWork function declared like this
pushWork(char *pdata){
push pdata to the thread pool's queue
}
Firt of all, I think char *pdata = buf has a problem since it just points to the buffer and the buffer will be overriden whenever a new data comes in. So do I need to memcpy?
Also, is there any other nice way to handle this in c++ ? This code is kind of c style I think I have a better way to do this in c++.

is there any other nice way to handle
this in c++ ? This code is kind of c
style I think I have a better way to
do this in c++
Like I suggested in one of your previous questions, the Boost.Asio library is the de-facto C++ networking library. I strongly suggest you spend some time reading about it and studying the examples if you are writing a C++ networking application.

That way is bad because you have only one buf, meaning that you can only have one thread at a time working on it, so you are not using threads like they should be used.
What you can do is malloc() a buffer, copy the payload to that malloc'd buffer and pass it to the thread and let the thread free that buffer once it's done with it. Just be sure to free it or else there will be memory leaks.
example code:
char * p;
p = (char *)malloc(str_len+1);
memcpy(p, buf, str_len+1);
pushWork(p); //free p inside, after use.

You will need to do a memcpy (or something equivalent) unless you can make it so that the data in (buf) is never needed after pushWork() returns. If you can afford to all the processing of the buffer's data inside pushWork(), OTOH, no copying of the buffer is necessary.
Another alternative would be to allocate a buffer off the heap in advance each time therough the epoll loop, and read() directly into that buffer instead of reading into a buffer on the stack... that way the dynamically allocated buffer would still be available for use ever after the event loop goes on to other things. Note that doing this would open the door to possible memory leaks or memory exhaustion though, so you'd need to be careful to avoid those issues.
As far as a "C++ way" to do this, I'm sure there are some, but I don't know that they will necessarily improve your program. The C way works fine, why fix what isn't broken?
(One other thing to note: if you are using non-blocking I/O, then read() will sometimes read and return fewer than BUF_SIZE bytes, in which case you'll need logic to handle that eventuality. If you are using blocking I/O, OTOH, read() will sometimes block for long periods (e.g. if a client machine dies having sent only half of a buffer) which will block your epoll loop and make it unresponsive for perhaps several minutes. That's a more difficult problem to deal with, which is why I usually end up using non-blocking I/O, even if it is more of a pain to get right.

Related

Kill std::thread while reading a big file

I have a std::thread function that is calling fopen to load a big file into an array:
void loadfile(char *fname, char *fbuffer, long fsize)
{
FILE *fp = fopen(fname, "rb");
fread(fbuffer, 1, fsize, fp);
flose(fp);
}
This is called by:
std::thread loader(loadfile, fname, fbuffer, fsize);
loader.detach();
At some point, something in my program wants to stop reading that file and asks for another file. The problem is that by the time I delete the fbuffer pointer, the loader thread is still going, and I get a race condition that trows an exception.
How can I kill that thread? My idea was to check for the existance of the fbuffer and maybe split the fread in small chunks:
void loadfile(char *fname, char *fbuffer, long fsize)
{
FILE *fp = fopen(fname, "rb");
long ch = 0;
while (ch += 256 < fsize)
{
if (fbuffer == NULL) return;
fread(fbuffer + ch, 1, 256, fp);
}
fclose(fp);
}
Will this slow down the reading of the file? Do you have a better idea?
You should avoid killing a thread at all costs. Doing so causes evil things to happen, like resources left in a permanently locked state.
The thread must be given a reference to a flag, the value of which can be set from elsewhere, to tell the thread to voluntarily quit.
You cannot use a buffer for this purpose; if one thread deletes the memory of the buffer while the other is writing to it, very evil things will happen. (Memory corruption.) So, pass a reference to a boolean flag.
Of course, in order for the thread to be able to periodically check the flag, it must have small chunks of work to do, so splitting your freads to small chunks was a good idea.
256 bytes might be a bit too small though; definitely use 4k or more, perhaps even 64k.
Killing threads is usually not the way to go - doing this may lead to leaked resources, critical sections you cannot exit and inconsistent program state.
Your idea is almost spot-on, but you need a way to signal the thread to finalize. You can use a boolean shared between your thread and the rest of your code that your thread reads after every read, and once it is set, stops reading into the buffer cleans up the file handles and exits cleanly.
On another note, handling the deletion of pointers with owning semantics by yourself is most of the time frowned upon in modern C++ - unless you have a very good reason not to, I'd recommend using the stl fstream and string classes.
You need proper thread synchronization. The comments about resource leaks and the proposal by #Mike Nakis about making the thread exit voluntarily by setting a boolean are almost correct (well, they're correct, but not complete). You need to go even farther than that.
You must ensure not only that the loader thread exits on its own, you must also ensure that it has exited before you delete the buffer it is writing to. Or, at least, you must ensure that it isn't ever touching that buffer in any way after you deleted it. Checking the pointer for null-ness does not work for two reasons. First, it doesn't work anyway, since you are looking at a copy of the original pointer (you would have to use a pointer-pointer or a reference). Second, and more importantly, even if the check worked, there is a race condition between the if statement and fread. In other words, there is no way to guarantee that you aren't freeing the buffer while fread is accessing it (no matter how small you make your chunks).
At the very minimum, you neeed two boolean flags, but preferrably you would use a proper synchronization primitive such as a condition variable to notify the main thread (so you don't have to spin waiting for the loader to exit, but can block).
The correct way of operation would be:
Notify loader thread
Wait for loader thread to signal me (block on cond var)
Loader thread picks up notification, sets condition variable and never touches the buffer afterwards, then exits
Resume (delete buffer, allocate new buffer, etc)
If you do not insist on detaching the loader thread, you could instead simply join it after telling it to exit (so you would not need a cond var).

What is the best way to implement an echo server with async i/o and IOCP?

As we all know, an echo server is a server that reads from a socket, and writes that very data into another socket.
Since Windows I/O Completion ports give you different ways to do things, I was wondering what is the best way (the most efficient) to implement an echo server. I'm sure to find someone who tested the ways I will describe here, and can give his/her contribute.
My classes are Stream which abstracts a socket, named pipe, or whatever, and IoRequest which abstracts both an OVERLAPPED structure and the memory buffer to do the I/O (of course, suitable for both reading and writing). In this way when I allocate an IoRequest I'm simply allocating memory for memory buffer for data + OVERLAPPED structure in one shot, so I call malloc() only once.
In addition to this, I also implement fancy and useful things in the IoRequest object, such as an atomic reference counter, and so on.
Said that, let's explore the ways to do the best echo server:
-------------------------------------------- Method A. ------------------------------------------
1) The "reader" socket completes its reading, the IOCP callback returns, and you have an IoRequest just completed with the memory buffer.
2) Let's copy the buffer just received with the "reader" IoRequest to the "writer" IoRequest. (this will involve a memcpy() or whatever).
3) Let's fire again a new reading with ReadFile() in the "reader", with the same IoRequest used for reading.
4) Let's fire a new writing with WriteFile() in the "writer".
-------------------------------------------- Method B. ------------------------------------------
1) The "reader" socket completes its reading, the IOCP callback returns, and you have an IoRequest just completed with the memory buffer.
2) Instead of copying data, pass that IoRequest to the "writer" for writing, without copying data with memcpy().
3) The "reader" now needs a new IoRequest to continue reading, allocate a new one or pass one already allocated before, maybe one just completed for writing before the new writing does happen.
So, in the first case, every Stream objects has its own IoRequest, data is copied with memcpy() or similar functions, and everything works fine.
In the second case the 2 Stream objects do pass IoRequest objects each other, without copying data, but its a little bit more complex, you have to manage the "swapping" of IoRequest objects between the 2 Stream objects, with the possible drawback to get synchronization problems (what about those completions do happen in different threads?)
My questions are:
Q1) Is avoiding copying data really worth it!?
Copying 2 buffers with memcpy() or similar, is very fast, also because the CPU cache is exploited for this very purpose.
Let's consider that with the first method, I have the possibility to echo from a "reader" socket to multiple "writer" sockets, but with the second one I can't do that, since I should create N new IoRequest objects for each N writers, since each WriteFile() needs its own OVERLAPPED structure.
Q2) I guess that when I fire a new N writings for N different sockets with WriteFile(), I have to provide N different OVERLAPPED structure AND N different buffers where to read the data.
Or, I can fire N WriteFile() calls with N different OVERLAPPED taking the data from the same buffer for the N sockets?
Is avoiding copying data really worth it!?
Depends on how much you are copying. 10 bytes, not so much. 10MB, then yes, it's worth avoiding the copying!
In this case, since you already have an object that contains the rx data and an OVERLAPPED block, it seems somewhat pointless to copy it - just reissue it to WSASend(), or whatever.
but with the second one I can't do that
You can, but you need to abstract the 'IORequest' class from a 'Buffer' class. The buffer holds the data, an atomic int reference-count and any other management info for all calls, the IOrequest the OVERLAPPED block and a pointer to the data and any other management information for each call. This information could have an atomic int reference-count for the buffer object.
The IOrequest is the class that is used for each send call. Since it contains only a pointer to the buffer, there is no need to copy the data and so it's reasonably small and O(1) to data size.
When the tx completions come in, the handler threads get the IOrequest, deref the buffer and dec the atomic int in it towards zero. The thread that manages to hit 0 knows that the buffer object is no longer needed and can delete it, (or, more likely, in a high-performance server, repool it for later reuse).
Or, I can fire N WriteFile() calls with N different OVERLAPPED taking
the data from the same buffer for the N sockets?
Yes, you can. See above.
Re. threading - sure, if your 'management data' can be reached from multiple completion-handler threads, then yes, you may want to protect it with a critical-section, but an atomic int should do for the buffer refcount.

Bad memory address when allocating OpenCL buffer

I have a program running some image treatment with OpenCL, I sometimes have a crash because it's trying to write something into a memory address (with clCreateBuffer) that is null.
Is their any OpenCL call I can use to delay that memory write, or is it possible to check via C++ if a memory address is valid ?
You probably can use OpenCL events.
cl_int clWaitForEvents (cl_uint num_events,
const cl_event *event_list)
You can create an event from the call or operation you want to wait for, then before creating your buffer you wait for that event to complete.
However, could you provide a little bit of information. For example what exactly do you want to do? Maybe there is another way. It would also be better if you have some code showing your operations.

Asynchronous File I/O in C++

I can't find information about asynchronous reading and writing in C++. So I write code, function read() works correctly, but synchronization doesn't. Sync() function doesn't wait for the end of reading.
For my opinion variable state_read in thread has incorrect value. Please, understand me why.
struct IOParams{
char* buf;
unsigned int nBytesForRead;
FILE* fp;
};
struct AsyncFile {
FILE* fp;
bool state_read;
HANDLE hThreadRead;
IOParams read_params;
void AsyncFile::read(char* buf, unsigned int nBytesForRead){
sync();
read_params.buf = buf;
read_params.fp = fp;
read_params.nBytesForRead = nBytesForRead;
hThreadRead = CreateThread(0,0,ThreadFileRead,this,0);
}
void AsyncFile::sync() {
if (state_read) {
WaitForSingleObject(hThreadRead,INFINITE);
CloseHandle(hThreadRead);
}
state_read = false;
}
};
DWORD WINAPI ThreadFileRead(void* lpParameter) {
AsyncFile* asf = (AsyncFile*)lpParameter;
asf->setReadState(true);
IOParams & read_params = *asf->getReadParams();
fread(read_params.buf, 1, read_params.nBytesForRead, read_params.fp);
asf->setReadState(false);
return 0;
}
Maybe you know how to write the asynchronous reading in more reasonable way.
Maybe you know how to write the asynchronous reading in more reasonable way.
Since your question is tagged "Windows", you might look into FILE_FLAG_OVERLAPPED and ReadFileEx, which do asynchronous reading without extra threads (synchronisation via an event, a callback, or a completion port).
If you insist on using a separate loader thread (there may be valid reasons for that, though few), you do not want to read and write a flag repeatedly from two threads and use that for synchronisation. Although your code looks correct, the mere fact that does not work as intended shows that it's a bad idea.
Always use a proper synchronisation primitive (event or semaphore) for synchronisation, do not tamper with some flag that's (possibly inconsistently) written and read from different threads.
Alternatively, if you don't want an extra event object, you could always wait on the thread to die, unconditionally (but, read the next paragraph).
Generally, spawning a thread and letting it die for every read is not a good design. Not only is spawning a thread considerable overhead (both for CPU and memory), it can also introduce hard to predict "funny effects" and turn out to be a total anti-optimization. Imagine for example having 50 threads thrashing the harddrive on seeks, all of them trying to get a bit of it. This will be asynchronous for sure, but it will be a hundred times slower, too.
Using a small pool of workers (emphasis on small) will probably be a much superior design, if you do not want to use the operating system's native asynchronous mechanisms.

Non-blocking TCP buffer issues

I think I'm in a problem. I have two TCP apps connected to each other which use winsock I/O completion ports to send/receive data (non-blocking sockets).
Everything works just fine until there's a data transfer burst. The sender starts sending incorrect/malformed data.
I allocate the buffers I'm sending on the stack, and if I understand correctly, that's a wrong to do, because these buffers should remain as I sent them until I get the "write complete" notification from IOCP.
Take this for example:
void some_function()
{
char cBuff[1024];
// filling cBuff with some data
WSASend(...); // sending cBuff, non-blocking mode
// filling cBuff with other data
WSASend(...); // again, sending cBuff
// ..... and so forth!
}
If I understand correctly, each of these WSASend() calls should have its own unique buffer, and that buffer can be reused only when the send completes.
Correct?
Now, what strategies can I implement in order to maintain a big sack of such buffers, how should I handle them, how can I avoid performance penalty, etc'?
And, if I am to use buffers that means I should copy the data to be sent from the source buffer to the temporary one, thus, I'd set SO_SNDBUF on each socket to zero, so the system will not re-copy what I already copied. Are you with me? Please let me know if I wasn't clear.
Take a serious look at boost::asio. Asynchronous IO is its specialty (just as the name suggests.) It's pretty mature library by now being in Boost since 1.35. Many people use it in production for very intensive networking. There's a wealth of examples in the documentation.
One thing for sure - it take working with buffers very seriously.
Edit:
Basic idea to handling bursts of input is queuing.
Create, say, three linked lists of pre-allocated buffers - one is for free buffers, one for to-be-processed (received) data, one for to-be-sent data.
Every time you need to send something - take a buffer off the free list (allocate a new one if free list is empty), fill with data, put it onto to-be-sent list.
Every time you need to receive something - take a buffer off the free list as above, give it to IO receive routine.
Periodically take buffers off to-be-sent queue, hand them off to send routine.
On send completion (inline or asynchronous) - put them back onto free list.
On receive completion - put buffer onto to-be-processed list.
Have your "business" routine take buffers off to-be-processed list.
The bursts will then fill that input queue until you are able to process them. You might want to limit the queue size to avoid blowing through all the memory.
I don't think it is a good idea to do a second send before the first send is finished.
Similarly, I don't think it is a good idea to change the buffer before the send is finished.
I would be inclined to store the data in some sort of queue. One thread can keep adding data to the queue. The second thread can work in a loop. Do a send and wait for it to finish. If there is more data do another send, else wait for more data.
You would need a critical section (or some such) to nicely share the queue between the threads and possibly an event or a semaphore for the sending thread to wait on if there is no data ready.
Now, what strategies can I implement in order to maintain a big sack of such buffers, how should I handle them, how can I avoid performance penalty, etc'?
It's difficult to know the answer without knowing more about your specific design. In general I'd avoid maintaining your own "sack of buffers" and instead use the OS's built in sack of buffers - the heap.
But in any case, what I would do in the general case is expose an interface to the callers of your code that mirror what WSASend is doing for overlapped i/o. For example, suppose you are providing an interface to send a specific struct:
struct Foo
{
int x;
int y;
};
// foo will be consumed by SendFoo, and deallocated, don't use it after this call
void SendFoo(Foo* foo);
I would require users of SendFoo allocate a Foo instance with new, and tell them that after calling SendFoo the memory is no longer "owned" by their code and they therefore shouldn't use it.
You can enforce this even further with a little trickery:
// After this operation the resultant foo ptr will no longer point to
// memory passed to SendFoo
void SendFoo(Foo*& foo);
This allows the body of SendFoo to send the address of the memory down to WSASend, but modify the passed in pointer to NULL, severing the link between the caller's code and their memory. Of course, you can't really know what the caller is doing with that address, they may have a copy elsewhere.
This interface also enforces that a single block of memory will be used with each WSASend. You are really treading into more than dangerous territory trying to share one buffer between two WSASend calls.